MaDICS

Modélisation d’un Apprentissage humain via ses Réponses Imparfaites et Elicitation

Aug 31 – Sep 1 all-day

Offre en lien avec l’Action/le Réseau : – — –/– — –

Laboratoire/Entreprise : IRISA
Durée : 3 ans
Contact : constance.thierry@irisa.fr
Date limite de publication : 2023-08-31

Contexte :
De plus en plus d’apprenant, se tournent vers un nouveau système de cours à distance appelé MOOCs (Massive Open Online Course). Cette distance avec le participant, présente l’avantage pour ce dernier d’apprendre en tout lieu et à tout moment sans contraintes géographique et temporelle forte. L’inconvénient pour le maître d’apprentissage qui conçoit ces cours est que l’apprentissage distanciel complexifie l’estimation
de l’assimilation de l’information. Parmi les méthodes existantes pour évaluer l’acquisition de connaissances chez l’apprenant, l’utilisation de questions fermées comme des questionnaires à choix multiple (QCM) est la plus fréquente. Ils peuvent être classifiés en deux catégories : les QCM à réponses uniques (QCM-RU) et les QCM à réponses multiples (QCM-RM). Pour les QCM-RU, l’étudiant ne peut choisir qu’une unique réponse parmi l’ensemble proposé. Un problème inhérent aux QCM-RU est leur rigidité qui limite la capacité des répondants à s’exprimer. Ces questionnaires
ne donnent pas la possibilité aux étudiants de rendre compte de leur ignorance, leur imprécision et leur incertitude. Par conséquent, si les répondants hésitent entre deux (ou plusieurs) réponses, ils en choisiront une au hasard. Le problème dans le domaine de l’éducation est qu’une réponse aléatoire ne reflète pas exactement le niveau réel des connaissances de l’apprenant.

Sujet :
Il est essentiel de parvenir à définir une approche de contrôle des connaissances qui permette un retour qui rende mieux compte de l’apprentissage réel de l’individu inscrit aux cours en ligne. Pour ce faire, nous proposons dans cette thèse un système où l’apprenant répond à un questionnaire d’évaluation avec une possibilité d’être imprécis en cas d’hésitation tout en donnant sa certitude. L’objectif est de parvenir à estimer la connaissance du contributeur grâce à ses réponses imparfaites, mais aussi d’interagir avec lui afin de dynamiser cet apprentissage en lui donnant par exemple un retour sur sa réponse. Il faudrait ainsi estimer l’apprentissage de façon dynamique afin de déterminer le moment où l’apprenant a acquis les connaissances promises par le cours. Il s’agit également d’identifier son degré d’acquisition des connaissances pour l’optimisation du retour qui lui sera fait.

Profil du candidat :
Intéressé par la modélisation de contribution humaine, l’analyse et l’exploitation de données.
Des connaissances de la théorie des fonctions de croyance seraient un plus.

Formation et compétences requises :
École d’ingénieur ou master en informatique

Adresse d’emploi :
IUT de Lannion

Document attaché : 202304271257_These_druid.pdf

Categories: theses

Open PhD position on NLP at Sorbonne University jointly with ISIR (Paris, France) and IRL – ILLS (Montreal, Canada)

Aug 31 – Sep 1 all-day

Offre en lien avec l’Action/le Réseau : SimpleText/Doctorants

Laboratoire/Entreprise : ISIR (Paris, France) and IRL – ILLS (Montreal, Ca
Durée : 3 ans
Contact : pablo.piantanida@centralesupelec.fr
Date limite de publication : 2023-08-31

Contexte :
Large Language Models (LLMs) a.k.a foundation models have greatly im- proved the fluency and diversity of machine-generated text. Indeed, the release of ChatGPT and GPT-4 by OpenAI has sparked global discussions on the effec- tive use of AI-based writing assistants. However, this progress has also introduced considerable threats such as fake news, and the potential for harmful outputs such as toxic or dishonest speech, among others. As it seems, the research on methods aimed at detecting the origin of a given text to mitigate the dissemination of forged contents and to prevent technology-aided plagiarism lags behind the rapid advancement of AI itself. For tasks like question-answering, it is essential to know when we can trust the natural language outputs of foundation models. Likewise, for tasks like machine translation, it becomes important to detect hallucinations or omissions, i.e., translations that either contain information completely unrelated to the input or that do not include some of the input information.
Recent works have indeed focused on tools that are able to spot such AI- generated outputs to identify and address these underlying risks. However, many of the existing approaches rely on pre-existing classifiers for specific undesired out- puts, which restricts their applicability to situations where the harmful behavior is precisely known in advance.
Statistical analysis of lexical distributions is a valuable approach for anomaly detection in natural texts. By examining the frequency distributions of words and phrases in a given text or dataset, statistical methods can help identify unusual or anomalous patterns that deviate from the norm and these anomalies may indi- cate potentially harmful outputs, may reveal the origin of a given text, or detect hallucinations, stylistic inconsistencies, or even malicious intent in the text. By leveraging statistical methods to analyze lexical distributions, this thesis will fo- cus on the automatic uncovering of deviations and anomalies that may indicate irregularities or unexpected patterns in natural language texts.

Sujet :
Forged texts and misinformation are ongoing issues and are in existence all around us in biased software that amplifies only our opinions for a “better”, more seamless user experience. On social media platforms, such software is used by rogue states, businesses, and individuals to create misinformation, amplify doubts about fac- tual data or tarnish their competitors or adversaries, thereby enhancing their own strategic or economic positions. This spread may be the result of different factors and incentives; however, each poses the same fundamental issue to humanity: the misunderstanding of what is true and what is false.

Leveraging deep learning models for large-scale text generation such as GPT-3 and GPT-4 has seen widespread use in recent years due to superior performance over traditional generation methods, demonstrating an ability to produce texts of great quality, with a coherence and relevance that is sometimes hard to distinguish from human productions. These models generate text via an auto-regressive procedure that samples from a distribution learned to mimic the ”true” distribution of human written texts. Malicious uses of these technologies thus constitute a major threat to truthful information.

Artificial text detection can be viewed as a special case of anomaly detection, broadly defined as the task of identifying examples that deviate from regular ones to a degree that arouses suspicion. Current research in anomaly detection largely focuses either on deep classifiers (e.g., out-of-distribution detection, adversarial attack) or relies on the output of large language models when labeled data is unavailable. Although these lines of research are appealing, they do not scale without requiring a large amount of computing. Additionally, these methods make the fundamental assumptions that (1) the statistical information needed to iden- tify anomalies is available in the trained model, (2) the model uncertainty can be trusted, which is typically not the case as illustrated in the presence of a small shift in the input distribution. LLM-based approaches do not perform well when used on large text fragments, as may be needed in practical applications (e.g., novel, story, or news generation), because of the fixed length context used when training the language model.

This Ph.D. thesis focuses on developing hybrid anomaly detection methods using deep neural network-based techniques and word frequency distributions that are linguistically inspired. Most of the research on language models to date fo- cuses on sentence-level processing and fails to capture long-range dependencies at the discourse level. Instead, we will leverage word frequency distributions and information measures to characterize long documents, incorporating a very large number of rare words, which often leads to strange statistical phenomena such as mean frequencies that systematically keep changing as the number of observations is increased. Advanced concepts from statistics and information measures are nec- essary to understand the analysis of word frequency distributions and to capture document-level information. We are expected to design and develop novel statistical models and algorithms specifically tailored for analyzing lexical distributions in natural texts. Extensive experiments on real-world data sets will be executed to showcase the viability of our approach, benchmark its performance, and analyze its advantages, limitations, and areas for improvement.

*Research questions*
Some potential research questions for our consideration are:

• How can lexical distributions be effectively modeled and represented in natural language texts?

• What information (statistical) measures and techniques can be derived to identify anomalies in lexical distributions?

• How can contextual information and linguistic features be integrated into anomaly detection models based on lexical distributions?

• Can unsupervised learning techniques be leveraged to detect anomalies without the need for labeled anomaly data?

• How can domain-specific knowledge and expert (or mechanical) feedback be incorporated into the anomaly detection process to improve performance?

This research will provide a deeper understanding of statistical analysis techniques for anomaly detection in natural texts and contribute to the development of more accurate and reliable methods for identifying unusual patterns in language usage.

*Team supervision*
Institut des Systèmes Intelligents et de Robotique (ISIR) and the International Laboratory on Learning Systems (ILLS) are looking for a student with a background in AI and Data Science, who gets inspired by sciences and the opportunities of data and AI to solve complex NLP problems. You have strong programming skills and a very good understanding of data science, statistics, and Machine Learning.

*An international and stimulating environment for research*
ILLS will promote international mobility between France and Canada to facilitate collaborations with Ph.D students and professors in Canada. The university partners in Canada are: McGill University and École de Technologie Supérieure (ÉTS), and the Quebec Artificial Intelligence Institute (Mila), which are major players in AI at the inter- national. They are involved in many research, industrial and academic projects. François Yvon, who will supervise this thesis at ISIR (Sorbonne Université), is a senior researcher at CNRS and a recognized expert in Automatic language pro- cessing, Machine translation, Speech recognition, Statistical language modeling, Document mining, Learning by analogy. Prof. Pablo Piantanida, who will super- vise this thesis on the ILLS (McGill – ETS – Mila) side, is a recognized expert in information theory and Machine Learning. One of the strengths of the partners, is first the high level of the international within the recently created International Research Laboratory ILLS of the CNRS in Montreal, allowing a highly dynamic and rich research environment in AI at large.

Profil du candidat :
• Very good understanding of Machine Learning theory and techniques.
• Good programming skills in Python (PyTorch).
• Applications/ domain-knowledge in natural language processing is a plus.
• Good communication skills in written and spoken English.
• Creativity and ability to formulate problems and solve them independently.

Formation et compétences requises :
• MSc program in Computer Science, Machine Learning, Computer Engi- neering, Mathematics, or related field (e.g. applied mathematics/statistics).

Adresse d’emploi :
https://emploi.cnrs.fr/Offres/Doctorant/UMR7222-YVEGER-002/Default.aspx?lang=EN

Document attaché : 202307091401_PhD_Topic_CNRS_ISIR_ILLS.pdf

Categories: theses

Sep

Sat

Modeling temporal, rhythmic and social synchronization with spike neural networks

Sep 2 – Sep 3 all-day

Offre en lien avec l’Action/le Réseau : – — –/– — –

Laboratoire/Entreprise : Euromov DHM
Durée : 3 ans
Contact : patrice.guyot@mines-ales.fr
Date limite de publication : 2023-09-02

Contexte :
A 3-year fully funded PhD scholarship is proposed by the PhD school (ED I2S) in Alès / Montpellier within the ANR MODPULS project.
The successful applicant will become part of a dynamic research environment within the newly multidisciplinary joint research center EuroMov Digital Health in Motion.

See this offer on the EuroMov DHM website:
https://dhm.euromov.eu/wp-content/uploads/2021/06/Ph.D_MovementMusicSync.pdf

Start date: October 1st, 2023 (to September 2027).
Net remuneration around 1630€ monthly (including social security and health benefits).

A 6-month internship is also possible on the same project (March to August 2023). See this offer on the EuroMov DHM website: https://dhm.euromov.eu/wp-content/uploads/2022/12/M2_Modpuls.pdf

Sujet :
The temporality of information is crucial to our understanding of the world. Synchronization between different events guides our perception and our actions in many tasks. For example, speech understanding is improved by lip-reading in a context of synchronization between visual and sound perception.
In the field of artificial intelligence, spike neural networks offer a paradigm inspired by the functioning of the human brain, which is based on the synchronization between neuronal impulses. These neural networks are likely to be more efficient than the classical neural networks used in the field of machine learning, and less costly in terms of hardware. They also offer new possibilities for processing temporal data and analyzing synchronizations.
The MODPULS project aims at studying the possibilities and the limits of the use of spike neural networks for the analysis of temporal data related to synchronization, rhythm, and human movement. Through a set of temporal and rhythmic data of different natures and complexities, combining audio, video and human motion data, you will have to implements synchronization tasks with spike neural networks. The fine analysis of synchronization mechanisms opens the field to numerous applications, notably in the human sciences with musical practice, but also in the medical field through the therapeutic analysis of social synchronizations.

Profil du candidat :
Applicants should have (or anticipate having) a MSc and research background related to computer science, audio/signal processing, or computational movement science.

Formation et compétences requises :
Knowledge in music (theoretical and practical) will be valued. French is not mandatory, but the candidate must be willing to learn French during their PhD and they must be able to communicate in English.

Adresse d’emploi :
Ales ou Montpellier

Document attaché : 202302091411_Ph.D_Modpuls_Internship.pdf

Categories: theses

Sep

Sun

Trajectoires du bien vivre et bien vieillir sur son territoire

Sep 10 – Sep 11 all-day

Offre en lien avec l’Action/le Réseau : – — –/Doctorants

Laboratoire/Entreprise : LIUPPA
Durée : 3 ans
Contact : sebastien.laborie@iutbayonne.univ-pau.fr
Date limite de publication : 2023-09-10

Contexte :
Bien Vivre et Bien Vieillir dans son territoire représente un enjeu reconnu par les instances du Conseil départemental des Landes, de la communauté de communes MACS et du Syndicat Mixte de St Geours de Maremne en charge de l’aménagement et de la gestion du Parc d’Aménagement Atlantisud.
On estime qu’à l’horizon 2050, il y aura 30 000 Landais et Landaises dépendants alors qu’ils étaient 17 000 en 2015.
Fort de ce constat, notre projet a pour objectif de contribuer à l’amélioration du cadre de vie via la digitalisation. Il s’appliquera au domaine du bien vivre et bien vieillir sur le territoire des Landes. Le cadre de vie peut aller de l’entreprise pour les actifs, à l’habitat en passant par les services, les loisirs… L’idée principale est de faire parler les données multi-sources, massives et hétérogènes issues de partenaires afin d’une part de les analyser, mais surtout de mettre en évidence des manques et des recommandations afin de mieux vivre sur son territoire. Ces partenaires sont par exemple le Département des Landes sur des données Habitat, mobilité, ou déplacement, ou encore le bailleur social XL Habitat, Hubics sur les données du bâtiment… Le tout complété par de l’OpenData. Les données seront ingérées, stockées et analysées dans un Data Lake éco-responsable en cours de mise en place.

Sujet :
L’objectif de cette thèse est de proposer des analyses de données ainsi que des outils de visualisation permettant d’aider les décideurs locaux dans leur appréhension et l’évolution du territoire. Sur la base des données collectées (massives et hétérogènes), l’objectif est de tracer des trajectoires sémantiques représentant les activités des citoyens (déplacements, accès aux zones de tourisme, de santé, logement, achats, etc.) y compris les évolutions dans le temps avec des prospectives pour le futur.

Profil du candidat :
– Très bon niveau en programmation.
– Connaissance des technologies du Web Sémantique.
– Analyse de données (massives et hétérogènes).
– Bon niveau en anglais.

Formation et compétences requises :
– Master de Recherche en Informatique.
– Curiosité et force de proposition sont des qualités requises.

Adresse d’emploi :
Technopôle Domolandes (St Geours de Maremne)

Document attaché : 202308130807_PhD_LIUPPA-IRIT-Domolandes.pdf

Categories: theses

Sep

Sat

Offre de de thèse en co-tutelle (Perth – Australie à distribuer) – Attention date de dépôt des dossiers très proche

Sep 30 – Oct 1 all-day

Offre en lien avec l’Action/le Réseau : – — –/Doctorants

Laboratoire/Entreprise : University of Murdoch (Australia) – Tetis (Montpel
Durée : 36
Contact : maguelonne.teisseire@teledetection.fr
Date limite de publication : 2023-09-30

Contexte :
Joint PhD position
Data-Driven Methods for Modeling the 3D Structure of Plants

Sujet :
The aim of this PhD thesis is to develop data-driven techniques for modelling the 3D structure of plants and analyze how plant structure is affected by various intrinsic and extrinsic factors such as soil conditions and environmental factors. This is an important problem that has a wide range of applications in plant biology and agriculture. One of the main scientific challenges is to develop efficient algorithms for the extraction of features and patterns from 3D point clouds representing plant shape. Another challenge is to develop models that can simulate the growth and development of plant structures over time, taking into account various environmental factors. Another scientific question addressed in this project is how to analyze the complex relationships between plant structure and function at different scales. This involves the development of methods to measure and quantify plant traits such as biomass, leaf area, and stomatal density, and to relate these traits to plant function and performance. Overall, the project aims to advance our understanding of the structure-function relationships in plants and to provide new tools for plant breeders, ecologists, and agronomists to improve crop productivity and resilience in the face of environmental challenges.
Keywords: Deep Learning, 3D computer vision, shape analysis, geometric modelling.

Profil du candidat :
Qualification: The successful candidate is expected to have a MSc degree (or equivalent), with a significant research component, completed by September 2023, with background in either image processing, computer vision, computer graphics, machine learning applied for vision, or 3D geometry processing. Students with background in mathematics, especially 3D geometry, are highly encouraged to apply.

Formation et compétences requises :

Experience: The ideal candidate should have some knowledge and experience in at least one of the fields listed above. The successful candidate should have strong programming skills.

As for generic competences, we seek a qualified self-motivated professional, open to multidisciplinary, with capacity to undertake independent research, ability to work in a teamwork, and self-motivated.

Language Skills: Fluent written and verbal communication skills in English are required.

Adresse d’emploi :
The candidate should also be willing to spend 18 months in Australia and 18 months in France.

Document attaché : 202307101258_TETIS_Murdoch_Joint_PhD_position_2023.docx – Google Docs.pdf

Categories: theses

Offre de thèse interdisciplinaire IRIT/CLLE (Toulouse)

Sep 30 – Oct 1 all-day

Offre en lien avec l’Action/le Réseau : – — –/Doctorants

Laboratoire/Entreprise : IRIT/CLLE (Toulouse)
Durée : 3 ans
Contact : cassia.trojahn@irit.fr
Date limite de publication : 2023-09-30

Contexte :
Le projet RIMO cherche à construire le premier référentiel FAIR, conceptuel, terminologique et interdisciplinaire de la mémoire. Consultable au sein d’une plateforme en libre accès, il autorisera différents niveaux d’abstraction et d’interrogation. L’utilisateur (grand public, praticien ou spécialiste des sciences de la mémoire) pourra, en fonction de son expertise, exploiter les concepts de la mémoire en prenant différents points de vue. Ces points de vue pourront être des acceptations générales de ce que l’on entend par “mémoire” autant que des théories et outils spécifiques.

Sujet :
Les objectifs sont de :
(1) constituer un corpus textuel scientifique annoté à partir de textes issus des sous-disciplines de la science de la mémoire intégrées au champ d’application du projet. Ceci en s’appuyant sur un réseau de recherche (le GDR/Réseau Thématique Mémoire) et en exploitant une théorie cognitive des processus d’annotation ;
(2) exploiter, sur des corpus interdisciplinaires, des avancées récentes de l’intelligence artificielle dont l’apprentissage automatique et l’apprentissage par représentation afin d’extraire des termes et des relations entre eux. Les techniques du traitement du langage naturel, de l’extraction de connaissances utilisant des approches neuro-symboliques seront également mobilisées ;
(3) construire un modèle conceptuel du domaine de la mémoire au sein d’une ontologie qui doit tenir compte de différents points de vue et niveaux d’abstraction.

Profil du candidat :
Le projet de thèse proposé s’adresse principalement à un.e titulaire d’un Master en informatique, linguistique computationnelle, intéressé.e par les thématiques développées par l’IRIT et souhaitant construire une expertise à ce sujet (voir plus haut), tout en incorporant à son travail une expertise en psychologie sur les modèles à deux processus de mémoire (recollection, familiarité) et les représentations associées (représentation détaillée, représentation thématique).

Le projet proposé pourrait également convenir à un.e titulaire d’un Master en psychologie, déjà intéressé.e par les outils permettant d’étudier les processus de récupération en mémoire et les représentations associées, qui souhaiterait développer également des compétences en informatique (extraction de connaissances à partir de textes, construction d’ontologies, apprentissage automatique).

Formation et compétences requises :
(Voir profil)

Adresse d’emploi :
Toulouse

Categories: theses

Oct

Sun

PhD position on AI/NLP at the University of Brest, France

Oct 15 – Oct 16 all-day

Offre en lien avec l’Action/le Réseau : SimpleText/– — –

Laboratoire/Entreprise : Université de Bretagne Occidentale
Durée : 3 years
Contact : liana.ermakova@univ-brest.fr
Date limite de publication : 2023-10-15

Contexte :
The University of Brest has a fully funded PhD position (3 years) on AI for scientific text simplification. The PhD position is funded by the French National Research Agency (ANR) within the SimpleText project on automatic scientific text simplification (https://anr.fr/Projet-ANR-22-CE23-0019). We collaborate with a range of partners in the context of the related CLEF SimpleText track, and the candidate is encouraged to spend time as an intern at our research or industry partners (http://simpletext-project.com/2023/clef/). The PhD position is fully funded and you will be employed by the University of Brest for three years (full-time, with all employment benefits) and are expected to complete a PhD thesis within this period.

Sujet :
We seek an ambitious and highly talented PhD candidate to work on the interface of information retrieval (IR) and natural language processing (NLP) models applied to large-scale scientific text corpora. Our project aims to develop effective and efficient IR and NLP technology for promoting scientific information access, and to support non-professionals searching for scientific information in academic literature. Specifically, we deploy large language and foundation models for NLP tasks such as text simplification. There is considerable flexibility to shape the project to emerging research opportunities and the background and interests of the candidate.

Profil du candidat :
We are looking for a candidate with:
– an MSc in computer science, artificial intelligence (AI) or a related field;
– a strong scientific interest in NLP, IR, and AI;
– strong academic performance in university-level courses in the relevant subjects;
– demonstrable experience with machine learning and deep learning;
– experience in programming, software development, and data science tools;
– professional command of English and good presentation skills;
– the willingness to work collaboratively with other researchers and external stakeholders.

Formation et compétences requises :
If you feel the profile fits you, and you are interested in the job, we look forward to receiving your application. Job applications should be sent to deloor@enib.fr and liana.ermakova@univ-brest.fr. We accept applications until and including *13 October 2023*.
Applications should include the following information:
– A detailed CV including the months (not just years) when referring to your education and work experience;
– A letter of motivation explaining how the project is related to your research background;
– A list of publications (in case of joint authorship, please clearly indicate your own contribution) or a link to a writing sample available online, such as a Master’s thesis;
– A list of all Bachelor and Master-level modules you have taken, with an official transcript of grades;
– The names, affiliations, and email addresses of two academic referees who can provide details about your academic profile in relation to this position (please do not include any reference letters in your application).

Adresse d’emploi :
liana.ermakova@univ-brest.fr

Categories: theses

Oct

Tue

Domain-specific software development in natural language

Oct 31 – Nov 1 all-day

Offre en lien avec l’Action/le Réseau : – — –/– — –

Laboratoire/Entreprise : IRISA (Rennes) / ALTEN (Rennes)
Durée : 3 ans
Contact : zoltan.miklos@irisa.fr
Date limite de publication : 2023-10-31

Contexte :
As a number domains and industries go through a digital transformation, one can observe a constant creation of demand for programmers. While these industries face a shortage of available software developers, the programming tasks are very specific : they require specific domain knowledge and only a modest level of programmings skills. Even this is an important phenomenon, and a basic programming skill would be desirable for the majority of professions of the 21st century, the public education curricula do not address this problem sufficiently. Certain industries face a shortage of available software developers and this problem is likely to increase.

A number tools 1 -often based of artificial intelligence- are available to address this problem and enable people with no or little programming skills to become productive developers. Recently, a number of AI-based tools -ChatGPT, Copilot, CodeClippy 2, etc. – emerged that enable to generate code in different programming languages, out of natural language. These tools could largely improve the productivity of software developers, but to make use of these tools, one still needs competences in programming languages and an understanding of the generated code.

This thesis aims to develop methodologies and tools that can enable or support do- main specialists to engage in activities that result executable software. Specifically, we envisage that they not only describe their programming tasks in natural language but they test, and debug their software in natural language, without interacting with the code itself. We would like to develop tools and methodologies to realise this vision in two different use cases.

Sujet :
We would like to develop a methodology to develop domain-specific applications in natural language. The methodology should include the following aspects :

Program synthesis: Generating code out of natural language descriptions to a specific target environment

Guiding the developer in the writing phase : We would like to develop methods to guide the developer to improve the provided textual description of the task if the provided text description is not sufficiently precise, to generate a code.
https://cacm.acm.org/news/263950-no-code-ai-platforms-and-tools/fulltext
https://github.com/CodedotAl/gpt-code-clippy/wiki

Guiding the developer in the testing/debugging phase: We will develop methodologies to correct the generated program, without specific coding skills. In particular, if the the developers discover some unexpected behavior in the executed code, they should be able to modify their description. For this, they also need guidance on how to change the original text. Potentially, they interact with a visual representation of the code rather than the original text, but they should be able to change the code to correct the behavior of their software.
We plan to develop methodologies and tools for two use cases: autonomous vehicles simulation software testing. In both of the scenarios, the goal is to develop simple software with low complexity that requires only basic programming skills, but specific domain knowledge.

Autonomous vehicles test scenarios
In this use case, we will focus on the use case of autonomous vehicles, where one needs to develop test scenarios for the driving licence of the autonomous vehicles. These sce- narios are described in a well-defined, standard language the OpenScenario 3 and Open- Road 4. These scenarios can be executed using a scenario execution software, that gene- rates a visual presentation of the defined scenario.

Software testing
In this use case we would like to develop methods and tools to support software testing. The goal is to obtain executable test scripts out of natural language descriptions of test scenarios. If the resulting test script does not correspond to the intended scenario, we user should have guidance and suggestions how to modify the text input describing the task to get the desired results.

Research questions
Program synthesis [8] is a research domain that aims to develop methods that can synthesize executable code out of high level descriptions and domain specific languages (DSLs). Researchers have proposed a variety of methods, including the use of satis- fiability or SMT solvers, reasoners, and also evolutionary computing. The most recent and advanced methods are based on the technique of neurosymbolic programming [4]. These techniques enable to combine symbolic methods to assure that the hard (and soft) constrains that correct synthesized software are satisfied, with (neural network-based) machine learning. Some important contributions in this area include [2], [5], [6], [13], [14], [16], [1] , [10]. Some of the neurosymbolic programming systems are available as open source projets, including Dreamcoder (Ellis et al. [7]).

Our planned work will use neurosymbolic techniques. While these methods enable to

realise powerful tools, they do not address several points that are very important in our context :

https://www.asam.net/standards/detail/openscenario/
https://github.com/The-OpenROAD-Project

Interaction. We would like that the user can interactively influence the generation process. While some papers propose interactive synthesis, such as [18], they assume that the developer understands the synthesized code, while we would like that the interaction is based on natural language. Phrases in natural language could have to much ambiguity to define programming tasks. When we would like to guide the programmer we might need to rely on a different representation. This could be for example a description of the scenario in a controlled language [12], or other representation that is easy to understand. We would like to avoid the developer has to read the code itself.
Guiding the expert in the programming phase can require a number of methods, including the identification of ambiguous parts of the programs. Other techniques could involve proposing auto-completion techniques. Auto-completion techniques are widely used in different areas such as in information retrieval [3], in (graph) databases [17]. We propose specific auto-completion mechanisms for this form of software development. In this context, auto-completion should take into account the specific constraints of the domain. In our work, we would like to enable developers to define certain domain knowledge in the form of constraints. We would like to exploit these constraints to generate the auto-completion options. Methods for generating auto-completion suggestions -in the presence of constraints- might in- include probabilistic reasoning [15] or machine learning-based techniques. Examples of the use of these techniques in other domains include [9] or [11].
Bibliographie
J. Austin, A. Odena, M. Nye, M. Bosma, H. Michalewski, D. Dohan, E. Jiang,C. Cai, M. Terry, Q. Le, and C. Sutton. Program synthesis with large language models, 2021.
R. Bunel, M. J. Hausknecht, J. Devlin, R. Singh, and P. Kohli. Leveraging grammar and reinforcement learning for neural program synthesis. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30- May 3, 2018, Conference Track Proceedings. OpenReview.net, 2018.
F. Cai and M. de Rijke. A Survey of Query Auto Completion in Information Retrieval. Now Publishers Inc., Hanover, MA, USA, 2016.
S. Chaudhuri, K. Ellis, O. Polozov, R. Singh, A. Solar-Lezama, and Y. Yue. Neurosymbolic programming. Foundations and Trends® in Programming Languages, 7(3) :158–243, 2021.
T. Chen, S. Kornblith, M. Norouzi, and G. Hinton. A simple framework for contras- tive learning of visual representations. In Proceedings of the 37th International Conference on Machine Learning, ICML’20. JMLR.org, 2020.
M. D. Cranmer, A. Sanchez-Gonzalez, P. W. Battaglia, R. Xu, K. Cranmer, D. N. Spergel, and S. Ho. Discovering symbolic models from deep learning with inductive biases. In H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems 33 : Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020.
K. Ellis, C. Wong, M. I. Nye, M. Sablé-Meyer, L. Morales, L. B. Hewitt, L. Cary,A. Solar-Lezama, and J. B. Tenenbaum. Dreamcoder : bootstrapping inductive program synthesis with wake-sleep library learning. In S. N. Freund and E. Yahav, editors, PLDI ’21 : 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation, Virtual Event, Canada, June 20-25, 2021, pages 835–850. ACM, 2021.
S. Gulwani, O. Polozov, and R. Singh. Program synthesis. Foundations and Trends® in Programming Languages, 4(1-2) :1–119, 2017.
N. Q. V. Hung, M. Weidlich, N. T. Tam, Z. Miklós, K. Aberer, A. Gal, and B. Stantic. Handling probabilistic integrity constraints in pay-as-you-go reconciliation of data models. Information Systems, 83 :166 – 180, 2019. http://www.sciencedirect. com/science/article/pii/S030643791830320X.
N. Jain, S. Vaidyanath, A. Iyer, N. Natarajan, S. Parthasarathy, S. Rajamani, and R. Sharma. Jigsaw : Large language models meet program synthesis. In Proceedings of the 44th International Conference on Software Engineering, ICSE ’22, page 1219–1231, New York, NY, USA, 2022. Association for Computing Machinery.
K. Kikuchi, E. Simo-Serra, M. Otani, and K. Yamaguchi. Constrained graphic layout generation via latent optimization. In Proceedings of the 29th ACM International Conference on Multimedia, MM ’21, page 88–96, New York, NY, USA, 2021. Association for Computing Machinery.
T. Kuhn. A Survey and Classification of Controlled Natural Languages. Computational Linguistics, 40(1) :121–170, 03 2014.
A. Murali, A. Sehgal, P. Krogmeier, and P. Madhusudan. Composing neural learning and symbolic reasoning with an application to visual discrimination. In L. D. Raedt, editor, Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI 2022, Vienna, Austria, 23-29 July 2022, pages 3358–3365. ij- cai.org, 2022.
E. Parisotto, A. Mohamed, R. Singh, L. Li, D. Zhou, and P. Kohli. Neuro-symbolic program synthesis. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. Open-Review.net, 2017.
J. Pearl. Probabilistic reasoning in intelligent systems : networks of plausible inference. Morgan Kaufmann, San Francisco, Calif., 2009.
R. Shin, M. Allamanis, M. Brockschmidt, and O. Polozov. Program Synthesis and Semantic Parsing with Learned Code Idioms. Curran Associates Inc., Red Hook, NY, USA, 2019.
P. Yi, B. Choi, S. S. Bhowmick, and J. Xu. Autog : A visual query autocompletion framework for graph databases. Proc. VLDB Endow., 9(13) :1505–1508, sep 2016.
T. Zhang, L. Lowmanstone, X. Wang, and E. L. Glassman. Interactive program synthesis by augmented examples. In Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology, UIST ’20, page 627–648, New York, NY, USA, 2020. Association for Computing Machinery.

Profil du candidat :
very motivated, scientific curiosity, familiarity with NLP, machine learning

Formation et compétences requises :
titulaire d’un Master en Informatique (ou euivalent), très bon niveau français et anlais

Adresse d’emploi :
Univ Rennes CNRS IRISA
Campus universitaire de Beaulieu
263 Avenue du General Leclerc – Bat 12 (D267)
F-35042 Rennes Cedex
France

ALTEN
12 Rue du Patis Tatelin, 35000 Rennes

Document attaché : 202307201009_these_cifre_ALTEN_v2.pdf

Categories: theses

Nov

Thu

CIFRE – Cybersecurity with Machine Learning for industrial networks

Nov 30 – Dec 1 all-day

Offre en lien avec l’Action/le Réseau : – — –/– — –

Laboratoire/Entreprise : ICube laboratory – Technology & Strategy
Durée : 3 ans
Contact : Lafabregue@unistra.fr
Date limite de publication : 2023-11-30

Contexte :
Industry 4.0 is the novel industrial revolution, where objects are connected to a global network infrastructure. Fieldbus (e.g., CAN, modbus, TSN) interconnect the different devices to controllers. These objects are constrained in memory and computational capacity and may endanger the network infrastructure if they are corrupted. They may even jeopardize the safety of industrial applications.
Thus, cybersecurity for the Industrial Internet of Things is a major concern, while most of the technologies in this area have not been designed with this problem in mind. For instance, CAN communications are neither ciphered, nor authenticated.
We need to deploy Intrusion Detection Systems able to detect anomalies, i.e., when the infrastructure doesn’t behave as expected. It may come from e.g., a human misconfiguration, an attack.

Sujet :
Penetration testing already exploits Machine Learning techniques to detect and identify attacks. Indeed, signature-based solutions are not sufficient since they may disguise themselves into a legal traffic flow but inserting noise.
We want to go there further, to identify anomalies that may be e.g., attacks, misconfigurations, faults. Industrial networks are known to be predictable and we must identify outliers. Some work exists that consider the spatial and temporal correlations but they are application specific, i.e., they need to manipulate directly data chunks. Approaches exist that exploit a RNN to identify anomalies but we are convinced that industrial networks are predictable, and techniques that exploit this predictability should be more accurate. The network controller that has a complete knowledge of the network topology may efficiently detect intrusions.
The objective of this PhD thesis is to first propose techniques to identify automatically patterns when exploiting the list of packets transmitted in the network infrastructure. Indeed, a networked control application relies on a control loop (sensor to controller to actuator) to control the Cyber Physical System (CPS). It is important to characterize each of these control loops (period, source / destination, correlations, etc.). The PhD student will both exploit existing datasets as well as the networked control system testbed deployed at Technology & Strategy.
Then, we will derive Network Intrusion Detection Systems (IDS) to identify anomalies for each of these control loops, extending what has been done for home networks, or generic IP networks. We need to propose techniques to define what corresponds to a normal state, and what corresponds to an outlier / anomaly. The proposition must be sufficiently robust to detect sophisticated attacks such as the Schedule-Based Attacks.

Profil du candidat :
Master in computer science or similar fields, with an affinity for Machine Learning.

Formation et compétences requises :
Applicants should have solid skills in:
• Excellent knowledge of Machine Learning techniques (not only as a user);
• Excellent data science language skills (R, or Python);
• Background knowledge to implement measurements in a real production line;
• Excellent communication and writing skills. Note that knowledge of French is not required for this position.
Knowledge of the following technologies is not mandatory but will be considered as a plus:
• Knowledges in industrial networking protocols and stacks;
• Knowledges of embedded software

Adresse d’emploi :
The PhD student will be co-hosted by Technology & Strategy and the University of Strasbourg, both located in Strasbourg, France.
Technology & Strategy was created in 2008 in Strasbourg. Specialized in Engineering, IT, Digital and Project Management, Technology & Strategy is a reference partner for its customers in the development of innovative projects. Technology & Strategy also has an integrated engineering service to meet the requirements of its customers who are primarily R&D departments of industrial companies.
With a strong international focus and a Franco-German DNA, Technology & Strategy is proud of its 1,800 employees and is present with more than 40 nationalities in 16 offices in 6 countries (France, Germany, Switzerland, Belgium, UK, South East Asia). Technology & Strategy is proud to keep its headquarters in the East of France, near Strasbourg.

Founded in the 16th century, the University of Strasbourg has a long history of excellence in higher education, rooted in Renaissance humanism. The University of Strasbourg is a public research university located in Strasbourg, with over 52,000 students. You will integrate the ICube laboratory attached to the University.

Applications should be submitted by email to tands-cifre@icube.unistra.fr.
They must include:
• A Curriculum Vitae;
• List of 2 or 3 references to contact (position, email address);
• Transcripts of undergraduate and graduate studies;
• Link to MSc thesis, and publications if applicable;
• Link to personal software repositories (e.g. GitHub)
Please prefix the filenames of your application with your lastname.

Document attaché : 202303061259_202207070957_Fichier_TS-cybersec-iiot.pdf

Categories: theses

Development of a joint comparative analysis method for multi-omics data (multi-strain/multi-conditions). Application to filamentous fungi Trichoderma reesei

Nov 30 – Dec 1 all-day

Offre en lien avec l’Action/le Réseau : – — –/Doctorants

Laboratoire/Entreprise : IFP Energies nouvelles
Durée : 3 ans
Contact : laurent.duval@ifpen.fr
Date limite de publication : 2023-11-30

Contexte :
In its commitment to successfully carry out the energy transition, IFP Energies nouvelles is conducting research to optimize biotechnological processes with applications to more renewable energy sources. These processes require the use of microorganisms, for which we need to deepen our understanding of their molecular mechanisms.

Sujet :
In our approach, we adopt a systemic approach that considers different levels of biological regulation that interact with each other. We have gathered a set of genetic data, information on gene activity, and epigenetic imprints for our model organism Trichoderma reesei. But the question that arises is: how can we detect differences in the functioning of a biological system by combining different experimental data? To answer this question, we want to develop and implement new methodologies that integrate different types of data by identifying both the fundamental systemic mechanisms that remain constant and those specific to each experimental condition. We propose to explore different statistical analysis, data processing and optimization approaches such as Bayesian methods, source separation, and deep learning through variational autoencoders. These tools will help us better understand the functioning of our microorganisms of interest to optimize biotechnological processes in the fields of bio-based chemistry and biofuels. This thesis is linked to the targeted project GalaxyBioProd of the PEPR B-BEST (PEPR, Programmes et Ã‰quipements Prioritaires de Recherche or Priority Research Programmes and Equipments, are aimed at constructing or consolidating French leadership in specific scientific fields). The tools developed in this thesis will be made available to the community of biologists through their integration into the Galaxy platform.

Profil du candidat :
We are looking for a motivated student with strong skills in statistics, machine learning and bioinformatics. Prior experience in processing and analyzing omics data and computer programming is highly desirable. The candidate will work closely with our team of researchers and will benefit from a stimulating environment conducive to learning and professional development

Formation et compétences requises :
#bioinformatics, #MachineLearning, #DataScience

Adresse d’emploi :
IFPEN, 92000 Rueil-Malmaison
http://laurent-duval.eu/job-2023-phd-bioinformatics-data-multi-omics-comparative-analysis.html

Document attaché : 202309022113_job-phd-application-machine-learning-bioinformatic-statistics.pdf

Categories: theses

Dynamic Neural Network compression

Nov 30 – Dec 1 all-day

Offre en lien avec l’Action/le Réseau : – — –/– — –

Laboratoire/Entreprise : LIRIS
Durée : 3 ans
Contact : stefan.duffner@insa-lyon.fr
Date limite de publication : 2023-11-30

Contexte :
Research on machine learning and Deep Neural Networks (DNN) has made considerable progress in the past decades. State-of-the-art DNN models usually require large amounts of data to be trained and contain a tremendous number of parameters leading to overall high resource requirements, in terms of computation and memory and thus energy. In the past years, this gave rise to approaches to reduce these requirements, where, for example, during or after training, parts of the model are removed (pruning) or stored with lower precision (quantisation) or surrogate models are trained (knowledge distillation) or where the best configuration is searched by testing different parameters (Neural Architecture Search, NAS). Also, concerning the hardware, many optimisations have been proposed to accelerate the inference of DNNs on different architectures.
But these accelerators are usually specific to a given hardware and are optimised to satisfy certain static performance criteria. However, for many applications, the performance requirements of a DNN model deployed on a given hardware platform are not static but evolving dynamically as its operating conditions and environment change. In the context of this ANR-funded project we propose an original interdisciplinary approach that allows DNN models to be dynamically configurable at run-time on a given reconfigurable hardware accelerator architecture, depending on the external environment, following an approach based on feedback loops and control theory.

Sujet :
At software (SW) level, for a given DNN model, different variants with incremental precision levels can be obtained by setting parameters along different dimensions: (i) data precision or quantization (increasing/decreasing bit-width of activations and/or weights), (ii) degree of sparsification (e.g., pruning, tensor decomposition), (iii) depth of the NN (number and type of network layers to execute). Depending on the chosen SW precision level, the mean output accuracy will change, as well as energy consumption and timing. The key observation is that, for some particularly “easy” inputs, using high-precision energy-hungry computations is an “overkill”. Conversely, for “hard” inputs, low-precision energy-efficient computations are not enough. Therefore, being able to dynamically change the SW precision is key to enable energy-efficient and accurate NN computations.
At the same time, at the hardware (HW) level, the DNN accelerator needs to be configurable at runtime to satisfy SW processing requirements. Different degrees of existing HW strategies have an impact on energy consumption and runtime, which need to be taken into account when designing NN architectures and SW compression schemes.
This PhD thesis will concentrate on the SW (i.e. machine learning) side. There are two major challenges that need to be addressed. First, an effective method is required to train different DNN model variants for the same task but responding to different performance criteria, i.e., providing different levels of classification accuracy, throughput, energy consumption etc. Also, to reduce memory requirements, the more complex variants should be able to mutualize as much parameters as possible with the simpler models. Based on existing approaches and results on structured pruning and multi-exit models, one objective is to develop new algorithms and DNN architectures that can be parameterized at inference time to execute only the parts of the model that are really needed. For some layers, different parameter or activation precisions (8, 16, 32-bit) may be necessary and should be able to be switched at run time depending on the sensitivity of the different parts of the model, that should be quantified. The second challenge concerns the influence of these architectural changes on the performance. The impact of different possible SW configurations must be assessed in terms of final accuracy of the obtained results and in terms of the corresponding effort (e.g., in terms of number of operations, their datatype, potential computational over-head etc.).

Profil du candidat :
– A strong background in machine learning and in particular neural networks
– Capacity to work autonomously and within a research team
– Scientific curiosity and creativity
– Very good English proficiency

Formation et compétences requises :
A Master degree in computer science or applied mathematics or similar.

Adresse d’emploi :
INSA Lyon, laboratoire LIRIS, Lyon

Document attaché : 202310032028_thesis_proposal_insa_lyon.pdf

Categories: theses

Fri

2 open PhD positions in AI and databases

Dec 1 – Dec 2 all-day

Offre en lien avec l’Action/le Réseau : – — –/Doctorants

Laboratoire/Entreprise : Télecom SudParis and Université d’Artois
Durée : 3 ans
Contact : aikaterini.tzompanaki@cyu.fr
Date limite de publication : 2023-12-01

Contexte :
The ANR project EXPIDA (EXplainable and parsimonious Preference models
to get the most out of Inconsistent DAtabases) (Project website: https://www.cril.univ-artois.fr/expida/index.html) aims to develop principled and
rigorous explainable techniques for dealing with imperfect data. More precisely,
EXPIDA aims to design tractable methods for dealing with conflicts in databases
by efficiently exploring novel inconsistency-tolerant semantics and quantifying
contradictions [6] to answer queries and to draw (high level) explanatory information.
While the set of repairs (maximal consistent datasets) is often large for
real databases, we aim to explore preference mechanisms (e.g., [4]) in order to
retrieve meaningful answers and explanations to identify the reasons of query answers,
and to assist end-users to “realize” query outputs. The EXPIDA project
aims in addition to be useful for applications intensively relying on multiple
heterogeneous data sources. Many such applications are nowadays developed
in various domains such as transportation control, health management, social
network analysis, data journalism, etc. This research project will advance the
state-of-the-art in two major ways: innovations in inconsistency management,
preferences and explanation for databases, and developing practical Artificial
Intelligence tools for managing inconsistent databases with validations on real
data. In this context we are proposing two PhD positions.

Sujet :
1st PhD position: This PhD thesis will focus on the following two main aspects. The first one
concerns query answering semantics with inconsistent databases. Despite the
fact that various methods have been studied, for drawing useful information in
presence of conflicts in the propositional and description logic settings [2, 5], we
are not aware of any existing work studying rich conflict-tolerant relations in
the context of databases equipped with flexible preference relations [7], such as
the partially ordered relation (e.g., [1, 3]). Then, investigating database query
answering under such varying conflict-tolerant methods should be accompanied
by studying the computational complexity of the different related problems,
which is among the main objectives of this thesis.
The second objective of the thesis is how to handle preferences in conflicting
databases. In fact, to answer queries over conflicting databases, it is crucial
to express priority among potential repairs in order to select the most optimal
candidates. As the number of potential repairs can be (very) large, one may
choose to rank repairs according to some preference criteria, and select a small
number of the most desirable repairs. Moreover, we note that the preference
among the sources of data could be of different nature. Often, conflict-tolerant
methods aim to find a stratification inducing a total preorder among all pieces
of information. Such a stratification allows to handle more easily inconsistencies
in data sources. Nonetheless, this can lead to a comparison of incomparable and
independent pieces of information. The main objective here is then to develop
a new framework to handle preferences in conflicting databases in order to draw
meaningful answers to user queries.
Applications should be submitted via email to Said Jabbour (jabbour@cril.fr),
Badran Raddaoui (badran.raddaoui@telecom-sudparis.eu) and Yue Ma (ma@lri.fr)
with the subject “Application for EXPIDA PhD 1”.

2nd PhD position: This PhD thesis will focus on the aspect of explainability in two ways, as
presented in what follows.
First, we will consider explanations for query results over inconsistent databases
with different conflict-tolerant semantics (e.g., consistent, brave, intersection repair,
intersection closed repair, non-objection, nonconsensus based semantics,
etc.). To this end, we will adapt the notion of lineage (or provenance) [2] in the
context of uncertain/inconsistent data and devise mathematical formalisations
that will provide the necessary properties for characterising and measuring the
‘quality’ of the explanations. Through the study of causality [7] and argumentation
[4, 9, 8, 3] in our setting, we further aim at improving the acceptability and
usefulness of the provided explanations by the end-user. Second, we will investigate
the complementary problem of explaining missing query results, widely
known as Why-Not explanations, which has not been yet addressed in the context
of inconsistent databases. In the setting of consistent databases, Why-Not
explanations ‘explain’ why certain results are not generated by a query (or a
workflow) by means of instance-based (i.e., source tuples), query-based (i.e.,
query operators) or refinement-based explanations (i.e., corrected query). Close
to our problem, [1] has proposed Why-Not provenance polynomials, which may
account for probabilistic tuples. It would be interesting to check how such formalisations
can be revisited to fit the inconsistent database’s different conflict tolerant semantics. Applications should be submitted via email to Badran RADDAOUI badran.raddaoui@telecomsudparis.
eu and Aikaterini TZOMPANAKI aikaterini.tzompanaki@cyu.fr, with
the subject “Application for EXPIDA PhD 2”.

Profil du candidat :
The PhD candidate should have a background on computer science and on at least on of the following domains: databases, logics, artificial intelligence, algorithms and complexity. They should be excellent students, self-motivated and eager learners.

Formation et compétences requises :
Engineering degree or Bachelor and Master’s degree in Computer science or (applied) mathematics.

Adresse d’emploi :
1st PhD: Université d’Artois, Lens-France
2nd PhD: Télécom SudParis, Paleseau-France

Document attaché : 202310032054_EXPIDA_PhD_subjects.pdf

Categories: theses

Thu

Self-Supervised Anomaly Detection in complex-valued SAR imaging

Dec 7 – Dec 8 all-day

Offre en lien avec l’Action/le Réseau : – — –/– — –

Laboratoire/Entreprise : ONERA / SONDRA, CentraleSupelec
Durée : 36 mois
Contact : chengfang.ren@centralesupelec.fr
Date limite de publication : 2023-12-07

Contexte :
Deep anomaly detection methods leverage neural networks to automatically extract crucial data features, mapping high-dimensional data into a more manageable, lower-dimensional latent space, thereby significantly enhancing anomaly detection performance. One standard method for anomaly detection is to utilize Autoencoders (AE) for data encoding and reconstruction, detecting anomalies based on reconstruction errors [S. Sinha, 20, S. Mabu, 21]. Due to the presence of speckle noise in SAR images, [M. Muzeau, 2022] proposed to denoise SAR images using the MERLIN algorithm [E. Dalsasso, 2021b] based on the noise2noise principle [J. Lehtinen, 18, E. Dalsasso, 21a]. This pre-processing step leads to better compression in the latent space, subsequently improving the detection performance. Further extension in [M. Muzeau, 23] proposed to guide the Adversarial AE (AAE) in the training process by filtering anomalies using an RX detector [I. S. Reed, 90].
On the other hand, self-supervised learning leverages pretext tasks to extract supervised information from unsupervised data, thereby learning valuable feature representations for downstream tasks such as classification, object detection, and segmentation [M. Caron, 21]. Self-supervised anomaly detection methods acquire data representations by creating supervised pretext tasks. The key to constructing these pretext tasks is to guide the model in learning a specialized representation suitable for anomaly detection, distinct from the general representation obtained through unsupervised learning.

Sujet :
This Ph.D. aims to investigate the above-mentioned methods for SAR anomaly detection, exploiting SAR diversities: polarimetric and interferometric channels [Pottier, 09], multi-bands, and multi-looks representation [A. Mian, 19]. Particular attention is dedicated to the phase information of the complex-valued SAR images, which is crucial to assessing the spectral (range-azimuth) bandwidth and keeping the coherency in polarimetric and interferometric channels. The Ph.D. student will rely on the previously developed open-source library (https://github.com/NEGU93) developed in [Barrachina, 19] for complex-valued radar data and based on Tensorflow although recent developments of the PyTorch framework now allow for processing complex-valued tensors with differentiable computational graphs. Using this library, it is possible to address and analyze any recent Machine Learning components like Autoencoders, Transformers, etc., through challenging theoretical methodologies (SAR denoising, self-supervised learning, characterization of latent spaces, etc.).

References:

• [S. Sinha, 20] S. Sinha et al., “Variational autoencoder anomaly detection of avalanche deposits in satellite SAR imagery,” in Proc. 10th Int. Conf. Climate Inform., 2020, pp. 113–119.
• [S. Mabu, 21] S. Mabu, S. Hirata, and T. Kuremoto, “Anomaly detection
using convolutional adversarial autoencoder and one-class SVM for landslide area detection from synthetic aperture radar images,” J. Robot., Netw. Artif. Life, vol. 8, no. 2, pp. 139–144, 2021.
• [M. Muzeau, 22] M. Muzeau, C. Ren, S. Angelliaume, M. Datcu and J. –
P. Ovarlez, “Self-Supervised Learning Based Anomaly Detection in Synthetic Aperture Radar Imaging,” in IEEE Open Journal of Signal Processing, vol. 3, pp. 440-449, 2022.
• [M. Muzeau, 23] M. Muzeau, C. Ren, S. Angelliaume, M. Datcu and J. . -P.
Ovarlez, “Self-Supervised SAR Anomaly Detection Guided with RX Detec-
tor,” IGARSS 2023 – 2023 IEEE International Geoscience and Remote Sensing Symposium, Pasadena, CA, USA, 2023, pp. 1918-1921.
• [J. Lehtinen, 18] J. Lehtinen et al., “Noise2Noise: Learning image restoration without clean data,” in Proc. 35th Int. Conf. Mach. Learn., 2018, vol. 80, pp. 2965–2974.
• [E. Dalsasso, 21a] E. Dalsasso, L. Denis, and F. Tupin, “SAR2SAR: A semi-
supervised despeckling algorithm for SAR images,” IEEE J. Sel. Topics Appl.
Earth Observ. Remote Sens., vol. 14, pp. 4321–4329, 2021.
• [E. Dalsasso, 21b] E. Dalsasso, L. Denis and F. Tupin, (2021), “As if by magic: self-supervised training of deep despeckling networks with MERLIN”, arXiv preprint arXiv:2110.13148.
• [I. S. Reed, 90] I. S. Reed and X. Yu, “Adaptive multiple-band CFAR detection of an optical pattern with unknown spectral distribution,” IEEE Transactions on acoustics, speech, and Signal Processing, vol. 38, no. 10, pp. 1760–1770, 1990.
• [M. Caron, 21] M. Caron, H. Touvron, I. Misra, H. Jégou, J. Mairal, P. Bo-
janowski, and A. Joulin. Emerging properties in self-supervised vision transformers, in Proceedings of the International Conference on Computer Vision (ICCV), 2021.
• [A. Mian, 19] A. Mian, J.-P. Ovarlez, A. M. Atto and G. Ginolhac, “Design of New Wavelet Packets Adapted to High-Resolution SAR Images With an Application to Target Detection”, Geoscience and Remote Sensing, IEEE
Transactions on, 57(6), pp.3919-3932, June 2019.
• [Pottier, 09] J.-S. Lee and E. Pottier, “Polarimetric Radar Imaging: From
Basics to Applications”, CRC Press, 2009.
• [Barrachina, 23] J.-A. Barrachina, C. Ren, G. Vieillard, C. Morisseau, and J.-
P. Ovarlez, “Theory and implementation of complex-valued neural networks,” arXiv preprint arXiv:2302.08286, Feb. 2023.

Profil du candidat :
Master in machine learning, applied mathematics, statistics, or signal processing. Good technical skills in programming. Eager to work in the radar and SAR imaging field.

Formation et compétences requises :
Master in machine learning, applied mathematics, statistics, or signal processing. Good technical skills in programming. Eager to work in the radar and SAR imaging field.

Adresse d’emploi :
The Ph.D. student will be hosted at the SONDRA laboratory (joint international laboratory between CentraleSupélec, ONERA, DSO National Laboratories, and National University of Singapore) in Paris-Saclay campus in Gif-sur-Yvette and at the MATS research unit (Advanced Methods in Signal Processing) of the Electromagnetism and Radar Department at ONERA’s Palaiseau site. Due to the international visibility of the lab, some overseas exchanges with Singapore could be easily considered. The SONDRA laboratory may finance any conference travel by the doctoral student.

Document attaché : 202312071051_Self_Supervised_Anomaly_Detection_in_complex_valued_SAR_imaging.pdf

Categories: theses

Fri

Deep Learning Meets Numerical Modelling AI and Biophysics for Computational Cardiology

Dec 15 – Dec 16 all-day

Offre en lien avec l’Action/le Réseau : – — –/– — –

Laboratoire/Entreprise : Epione research team, Inria Sophia Antipolis – Mé
Durée : 36 mois
Contact : maxime.sermesant@inria.fr
Date limite de publication : 2023-12-15

Contexte :
Clinical Context

Cardiac Arrhythmias are a major healthcare issue. For instance, atrial fibrillation (AF) is the most common cardiac arrhythmia, characterized by chaotic electrical activation of the atria, preventing synchronized contraction. More than 6 million Europeans suffer from it and age is the most powerful predictor of risk. Life-threatening complications and fast progression to persistent or permanent forms call for as early as possible diagnosis and effective treatment. Arrhythmias are often treated with anti-arrhythmic drugs, with limited efficacy and safety. Catheter ablation, an invasive procedure, is more effective. This procedure is by no means optimized, however, and arrhythmias may reoccur. The efficacy of first-time ablation may range from 30%-75% depending on the individual patient and disease, such that multiple ablation procedures may be recommended.

Deep learning context

AI and more precisely machine learning have obtained impressive results in several domains like vision, natural language processing, bioinformatics. However, this data intensive paradigm leads to model that often lack interpretability and robustness. Also, it does not allow an easy integration of prior knowledge available in many scientific fields. This can explain its difficult adoption in domains like healthcare. On the other hand, biophysical modelling of the human body is a well-posed mathematical framework to introduce physiology into predictive analysis of clinical data. Moreover, it provides a natural mechanistic framework to interpret results. However, there is often a large computational cost, even more when the quantification of uncertainty has to be performed. And it is sometimes difficult to circumvent model approximations. A major scientific challenge today consists in combining the versatility of data intensive approaches with the physically grounded modelling approaches developed in scientific fields like biophysics.

Sujet :

The scientific objective of this project is to combine the advantages of biophysics and machine learning, more specifically deep learning methods, and to develop hybrid models exploiting the complementarity of the two approaches. We propose to introduce physiological priors in learning systems through biophysical modelling by learning spatiotemporal dynamics from simulations and by introducing physically motivated constraints relative to these dynamics. The objective is to exploit optimally the large amounts of data available in this field together with well-known properties of biophysical cardiac dynamics. Besides, this would also enable us to propose a data-driven correction of biophysical model error. Finally, we will seek a principled integration of uncertainty quantification within this framework. This will encompass both uncertainty on the training data and in the prediction. The vast amount of knowledge in mathematical analysis and data assimilation will be leveraged to optimise the machine learning formulation and understanding.
This project will be done in collaboration with cardiologists and radiologists to access clinical databases in order to evaluate the proposed methods on diagnosis, therapy planning and prognosis for cardiac pathologies.

Profil du candidat :
Master in computer science or applied mathematics, Engineering school. Background and experience in machine learning. Good technical skills in programming. Eager to work in the medical field.

Formation et compétences requises :
Master in computer science or applied mathematics, Engineering school. Background and experience in machine learning. Good technical skills in programming. Eager to work in the medical field.

Adresse d’emploi :
Epione Research Project

Inria, Sophia Antipolis

2004 route des Lucioles BP 93

06 902 SOPHIA ANTIPOLIS Cedex

FRANCE

Document attaché : 202301120851_2023 – PhD_DeepNum-Cardiac-Electrophysiology.pdf

Categories: theses

Physics-Aware Deep Learning for Modeling Spatio-Temporal Dynamics.

Dec 15 – Dec 16 all-day

Offre en lien avec l’Action/le Réseau : – — –/– — –

Laboratoire/Entreprise : Sorbonne Universite – ISIR – Institut des Systèmes
Durée : 36 mois
Contact : patrick.gallinari@sorbonne-universite.fr
Date limite de publication : 2023-12-15

Contexte :
Physics-aware deep learning is an emerging research field aiming at investigating the potential of AI methods to advance scientific research for the modeling of complex natural phenomena. This research topic investigates how to leverage prior knowledge of first principles (physics) together with the ability of machine learning at extracting information from data. This is a fast-growing field with the potential to boost scientific progress and to change the way we develop research in a whole range of scientific domains. An area where this idea raises high hopes is the modeling of complex dynamics characterizing natural phenomena occurring in domains as diverse as climate science, earth science, biology, fluid dynamics, etc.

Sujet :
The objective of the PhD project is the development of Physics-aware deep learning methods for the modeling of complex spatio-temporal dynamics. The direct application of state-of-the-art deep learning (DL) methods for modeling and solving physical dynamics occurring in nature is limited by the complexity of the underlying phenomena, the need for large amounts of data and their inability to learn physically consistent laws. This has motivated the recent exploration of physics-aware methods incorporating prior physical knowledge. Although promising and rapidly developing, this research field faces several challenges. For this PhD project we will address two main challenges, namely the construction of hybrid models for integrating physics with DL and generalization issues which condition the usability of DL for physics.

— Integrating DL and physics for spatio-temporal dynamics forecasting and solving PDEs

In physics and many related fields, partial differential equations (PDEs) are the main tool for modeling and characterizing the dynamics underlying complex phenomena. Combining PDE models with ML is a natural idea when building physics-aware DL models and it is one of the key challenges in the field. This has been explored for two main directions: (i) augmenting low resolution solvers with ML in order to reach the accuracy of high-fidelity models at a reduced computational cost, and (ii) complementing incomplete physical models with ML by integrating observation data through machine learning. A first direction of the PhD will then be to investigate hybrid physics-DL models using the recently proposed framework of neural operators. The latter opens the possibility of combining and learning multiple spatio-temporal scales within a unified formalism, a challenge in DL.

— Domain generalization for deep learning based dynamical models

Explicit physical models come with guarantees and can be used in any context (also called domain or environment) where the model is valid. These models reflect explicit causality relations between the different variables involved in the model. This is not the case for DL: statistical models learn correlations from sample observations, their validity is usually limited to the context of the training domain. This is a critical issue for the adoption of ML for modeling the physical world. In relation with the construction of hybrid models as described above, one will investigate this issue along two main directions. The first one is a purely data-based approach and exploits ideas from learning from multiple environments through task decomposition. The second one, takes a dual perspective, relying on prior physical knowledge of the system equations and directly targets the problem of solving parametric PDEs, exploiting ideas from meta-learning.

Profil du candidat :
Computer science or applied mathematics. Good programming skills.

Formation et compétences requises :
Master degree in computer science or applied mathematics, Engineering school. Background and experience in machine learning.

Adresse d’emploi :
Sorbonne Université (S.U.), Pierre et Marie Campus in the center of Paris. The candidate will integrate the MLIA team (Machine Learning and Deep Learning for Information Access) at ISIR (Institut des Systèmes Intelligents et de Robotique).

Document attaché : 202306130923_2023-04-PhD-Description-Physics-Aware-Deep-Learning.pdf

Categories: theses

Sun

Calls for 3 PhD positions in Artificial Intelligence & Digital Humanities at University of Crete (Greece)

Dec 31 2023 – Jan 1 2024 all-day

Offre en lien avec l’Action/le Réseau : – — –/– — –

Laboratoire/Entreprise : TALOS – University of Crete (Greece) at Rethymno
Durée : 3 ans
Contact : roche@univ-savoie.fr
Date limite de publication : 2023-12-31

Contexte :
The objective of the Talos ERA Chair Project (https://talos-ai4ssh.uoc.gr/)funded by the European Union is to create at the University of Crete (UoC) a new research centre of excellence in Artificial Intelligence for SSH (Social Sciences and Humanities) and Digital Humanities (DH). TALOS will combine symbolic artificial intelligence, deep learning, and natural language processing to create a scalable, sustainable and explainable representation of knowledge in the Humanities and the Social Sciences.

TALOS is advertising 3 PhD positions in Artificial Intelligence for Digital Humanities

Sujet :
a) 1 PhD in “Semantic Annotation and Knowledge Graphs. Application to Classics/Ancient Greek”

Use Hybrid AI (Deep Learning, Symbolic AI, Natural Language Processing) to semantically annotate large text collections that cover vast historical time periods. Semantically annotated texts will be enriched with metadata, i.e. with references to concepts stored in knowledge graphs, including domain ontologies, for the purpose of effective data management. The objective is to produce open datasets that are shareable, searchable, findable, and linkable to external resources.

b) 1 PhD position in “Artificial Intelligence for the Preservation and Dissemination of Cultural Heritage”

The objective is to preserve and open cultural items in such a way that they are shareable, linkable and findable. Particular attention will be paid to scarce resources such as linear B inscriptions. Deep Learning will be used to complete inscriptions. Terminology, ontology and knowledge graphs will be used for their representation. Expected results include the creation of an online museum and library dedicated to the Linear B inscription in compliance with the Linked and Open Data standards.

c) 1 PhD in “Digitalisation of Education: Contribution of Artificial Intelligence to Curriculum Analysis”

The aim of the PhD is to propose a modelling and a digital representation of curricula that allow their processing by machines, for example for the study of their alignment with skills and activities. To this end, the contribution of artificial intelligence, natural language processing, Knowledge representation and standards (ISO & W3C) will be studied. This includes ontologies which “are used with great success in education because they allow to formulate the representation of a learning domain by specifying all concepts involved, relations between concepts and all properties and conditions that exist” [Stancin et al. 2020].

Profil du candidat :
Master in Computer Science or in Digital Humanities

Formation et compétences requises :
Master in Computer Science or in Digital Humanities

Adresse d’emploi :
Contact: Prof Christophe Roche (christophe.roche@uoc.gr – http://christophe-roche.fr/) & Dr Maria Papadopoulou (maria.papadopoulou@uoc.gr – http://o4dh.com/maria-papadopoulou)

Categories: theses

Data Privacy on Graphs with semantic information

Dec 31 2023 – Jan 1 2024 all-day

Offre en lien avec l’Action/le Réseau : – — –/– — –

Laboratoire/Entreprise : INSA Centre Val-de-Loire, Laboratoire LIFO
Durée : 3 ans
Contact : adrien.boiret@insa-cvl.fr
Date limite de publication : 2023-12-31

Contexte :
The Systems and Data Security Team of Laboratoire d’Informatique Fondamentale d’Orléans is offering a PhD position to study data privacy in graph databases containing semantic information.
This PhD position is part of the CyberINSA project.

Sujet :
We aim to explore explore how privacy guaranties potentially weaken in the case where graph databases respect a known schema or ontology, and to present adaptations and countermeasures.
Full subject in the attached file.

Profil du candidat :
• Research Master in computer science / engineering
• Knowledge or interest about databases (especially graph databases, e.g.
RDF) and data privacy

Formation et compétences requises :
• Ability to read and write English documents
• Proficiency in a coding language is preferred
• Willingness to work in autonomy and in a team

Adresse d’emploi :
INSA Centre Val de Loire, Bourges Campus
88 Boulevard Lahitolle, 18000 Bourges

Document attaché : 202306202242_These-Final.pdf

Categories: theses

Jan

Sat

2024

Bayesian approach and inverse problems to estimate galaxy properties

Jan 6 – Jan 7 all-day

Offre en lien avec l’Action/le Réseau : – — –/Doctorants

Laboratoire/Entreprise : CRIStAL – UMR9189
Durée : 3 years
Contact : jenny.sorce@univ-lille.fr
Date limite de publication : 2024-01-06

Contexte :
English below

—————————————————————————–

Ce projet de thèse interdisciplinaire entre science des données et cosmologie s’inscrit dans le cadre d’une collaboration entre l’équipe SigMA du laboratoire CRIStAL (Lille) et l’équipe GEPI de l’observatoire de Paris.

L’équipe encadrante est constituée de Pierre Chainais (http://pierrechainais.ec-lille.fr/) et Jenny Sorce (https://jennygsorce.appspot.com/) (CRIStAL/SigMA) d’une part, et de Mathieu Puech (https://mathieu-puech.jimdosite.com/) et Hector Flores (Obs. de Paris / GEPI) d’autre part.

La thèse sera hébergée au laboratoire CRIStAL (Lille) dans l’équipe SigMA (https://www.cristal.univ-lille.fr/equipes/sigma/). L’équipe SigMA est reconnue pour son expertise en problèmes inverses et leurs applications en astrophysique au sens large. La présence, au sein de l’équipe, de Jenny Sorce, cosmologiste, assure un environnement interdiscplinaire quotidien. Des séjours à l’observatoire de Paris sont prévus.

Le poste se situe dans un secteur relevant de la protection du potentiel scientifique et technique (PPST), et nécessite donc, conformément à la réglementation, que votre arrivée soit autorisée par l’autorité compétente du MESR.

—————————————————————————–

This interdisciplinary thesis project between data science and cosmology is part of a collaboration between the SigMA team at the CRIStAL laboratory (Lille) and the GEPI team at Paris Observatory.

The supervisory team is made up of Pierre Chainais and Jenny Sorce (CRIStAL/SigMA) on the one hand, and Mathieu Puech and Hector Flores (Obs. de Paris / GEPI) on the other.

The thesis will be hosted at the CRIStAL laboratory (Lille) in the SigMA team (https://www.cristal.univ-lille.fr/equipes/sigma/). The SigMA team is recognized for its expertise in inverse problems and their applications to astrophysics in the broadest sense. The presence on the team of Jenny Sorce, cosmologist, ensures a daily interdisciplinary environment. Visits to the Paris Observatory are planned.

The position is located in a sector under the protection of scientific and technical potential (PPST), and therefore requires, in accordance with the regulations, that your arrival is authorized by the competent authority of the MESR.

Sujet :
English below
—————————————————————————–

Résumé en français (voir sujet détaillé ci-dessous) :

Le modèle cosmologique standard postule que matière noire et énergie sombre constituent ~95 % de l’Univers. Des analyses de relevés de galaxies révèlent des contradictions entre observations et modèle. L’inférence des paramètres cosmologiques à partir des propriétés de galaxies résulte d’une chaîne de traitement complexe impliquant observations et théories astrophysiques, sciences du numérique et des données. Le débat consiste à déterminer si ces tensions proviennent d’une nouvelle physique ou d’approximations entraînant des biais systématiques. Ce projet vise à perfectionner la chaîne d’inférence en utilisant les simulations CLONES, fournies par CRIStAL, comme vérité-terrain, les images multi-longueurs d’onde et les spectres de galaxies de l’équipe GEPI, et les dernières avancées en termes d’inférence Bayésienne et d’apprentissage automatique de l’équipe CRIStAL/SigMA. En inférant sans biais, grâce aux sondages de l’univers, son taux d’expansion, ce projet pourrait résoudre le paradoxe apparent du décalage entre valeurs théorique et inférée de ce taux.

Mots clefs : Problèmes inverses – Inférence Bayésienne – Apprentissage automatique – Galaxies – Cosmologie

Lien vers l’annonce complète : https://emploi.cnrs.fr/Offres/Doctorant/UMR9189-JENSOR-001/Default.aspx

—————————————————————————–

English summary (see detailed project below):

The standard cosmological model postulates that dark matter and dark energy make up ~95% of the Universe. Several analyses of galaxy surveys reveal contradictions between the observations and this model. The inference of cosmological parameters from galaxy properties is the result of a complex pipeline involving astrophysical observations and theories, numerical and data sciences. The debate is to determine whether these tensions arise from new physics or from approximations leading to systematic biases. This project aims to perfect the inference pipeline using CLONES simulations provided by CRIStAL as ground truth, multi-wavelength images and galaxy spectra from the GEPI team, and the latest advances in Bayesian inference and machine learning from the CRIStAL/SigMA team. By unbiasedly inferring the Universe expansion rate from surveys, this project could resolve the apparent paradox of the discrepancy between theoretical and inferred values of this rate.

Keywords: Inverse problems – Bayesian inference – Machine learning – Galaxies – Cosmology

Link to full ad: https://emploi.cnrs.fr/Offres/Doctorant/UMR9189-JENSOR-001/Default.aspx

Profil du candidat :

Formation et compétences requises :

Adresse d’emploi :
UMR CRIStAL
Université de Lille – Campus scientifique
Bâtiment ESPRIT
Avenue Henri Poincaré
59655 Villeneuve d’Ascq

Categories: theses

Explicabilité des modèles neuronaux multimodaux pour l’analyse des compétences socio-émotionnelles : application à la formation des étudiants en médecine

Jan 6 – Jan 7 all-day

Offre en lien avec l’Action/le Réseau : – — –/– — –

Laboratoire/Entreprise : Inria Nancy & Paris
Durée : 36 mois
Contact : emmanuel.vincent@inria.fr
Date limite de publication : 2024-01-06

Contexte :

Sujet :
https://jobs.inria.fr/public/classic/fr/offres/2024-07503

Profil du candidat :

Formation et compétences requises :

Adresse d’emploi :
Centre Inria de l’Université de Lorraine
615 rue du Jardin Botanique
54600 Villers-lès-Nancy

Categories: theses

Synthèse de la parole pour l’alsacien et les langues de France