Development of a joint comparative analysis method for multi-omics data (multi-strain/multi-conditions). Application to filamentous fungi Trichoderma reesei

Offre en lien avec l’Action/le Réseau : – — –/Doctorants

Laboratoire/Entreprise : IFP Energies nouvelles
Durée : 3 ans
Contact : laurent.duval@ifpen.fr
Date limite de publication : 2023-11-30

Contexte :
In its commitment to successfully carry out the energy transition, IFP Energies nouvelles is conducting research to optimize biotechnological processes with applications to more renewable energy sources. These processes require the use of microorganisms, for which we need to deepen our understanding of their molecular mechanisms.

Sujet :
In our approach, we adopt a systemic approach that considers different levels of biological regulation that interact with each other. We have gathered a set of genetic data, information on gene activity, and epigenetic imprints for our model organism Trichoderma reesei. But the question that arises is: how can we detect differences in the functioning of a biological system by combining different experimental data? To answer this question, we want to develop and implement new methodologies that integrate different types of data by identifying both the fundamental systemic mechanisms that remain constant and those specific to each experimental condition. We propose to explore different statistical analysis, data processing and optimization approaches such as Bayesian methods, source separation, and deep learning through variational autoencoders. These tools will help us better understand the functioning of our microorganisms of interest to optimize biotechnological processes in the fields of bio-based chemistry and biofuels. This thesis is linked to the targeted project GalaxyBioProd of the PEPR B-BEST (PEPR, Programmes et Ã‰quipements Prioritaires de Recherche or Priority Research Programmes and Equipments, are aimed at constructing or consolidating French leadership in specific scientific fields). The tools developed in this thesis will be made available to the community of biologists through their integration into the Galaxy platform.

Profil du candidat :
We are looking for a motivated student with strong skills in statistics, machine learning and bioinformatics. Prior experience in processing and analyzing omics data and computer programming is highly desirable. The candidate will work closely with our team of researchers and will benefit from a stimulating environment conducive to learning and professional development

Formation et compétences requises :
#bioinformatics, #MachineLearning, #DataScience

Adresse d’emploi :
IFPEN, 92000 Rueil-Malmaison
http://laurent-duval.eu/job-2023-phd-bioinformatics-data-multi-omics-comparative-analysis.html

Document attaché : 202309022113_job-phd-application-machine-learning-bioinformatic-statistics.pdf

CPJ/Tenure track position, AI and Climate Extremes

Offre en lien avec l’Action/le Réseau : – — –/– — –

Laboratoire/Entreprise : Lab-STICC/IMT Atlantique
Durée : 5 years
Contact : ronan.fablet@imt-atlantique.fr
Date limite de publication : 2023-09-30

Contexte :
Chaire de Professeur Junior “AI and Climate Extremes@ at IMT Atlantique/Lab-STICC/Odyssey. Successful applicants will first be hired on a ‘CDD de projet’ contract and tenure will occur at IMT Professor level.

Location: IMT Atlantique, Brest campus

Hosting research team: INRIA team Odyssey (https://team.inria.fr/odyssey/)

Key words : AI, applications to earth systems, extreme climate events, machine learning, dynamical systems, observing systems

Sujet :
The details of the position are available here: https://institutminestelecom.recruitee.com/o/chair-of-junior-professor-in-ai-climate-extremess-contrat-projet-5-ans

Profil du candidat :
See https://institutminestelecom.recruitee.com/o/chair-of-junior-professor-in-ai-climate-extremess-contrat-projet-5-ans

Formation et compétences requises :
See https://institutminestelecom.recruitee.com/o/chair-of-junior-professor-in-ai-climate-extremess-contrat-projet-5-ans

Adresse d’emploi :
IMT Atlantique, Brest campus

La science ouverte dans un laboratoire de neurosciences : analyse des changements de pratiques

Offre en lien avec l’Action/le Réseau : – — –/– — –

Laboratoire/Entreprise : Centre de Recherche en Neurosciences de Lyon – CRN
Durée : 6 mois
Contact : gaelle.leroux@cnrs.fr

Contexte :
Parmi les 450 membres du Centre de Recherche en Neurosciences de Lyon (CRNL), beaucoup participent activement au mouvement de la science ouverte, de manière individuelle et collective. On peut citer par exemple l’organisation d’un workshop sur les nouvelles façons d’évaluer et de diffuser les connaissances scientifiques dans l’espace numérique” (02/2020), des séminaires, des présentations didactiques comme celles sur le pré-enregistrement, Git ou des standards disciplinaires. Des groupes de travail se sont formalisés, notamment autour des questions des publications (gestion des collections HAL du Centre et des équipes, questionnaire annuel sur les formes et pratiques de publication depuis 2021) et de la diffusion de la science vers la société civile (cellules communication & valorisation ). Début 2022, les chef.fe.s d’équipe ont voté à l’unanimité la science ouverte comme une priorité et un « Plan du CRNL pour la Science Ouverte » a été adopté. Il décrit 4 axes prioritaires avec des objectifs concrets pour mettre en œuvre la science ouverte dans les pratiques.

Sujet :
Missions du stage :
• Appliquer la méthodologie du baromètre général de la science ouverte pour les publications du CRNL et analyser les résultats (Bracco & al. 2022)
• Quantifier les frais de publication du CRNL (voir OpenAPC, même période que le baromètre)
• Identifier à cette occasion, les éventuelles publications dans des revues prédatrices
• Identifier un outil collaboratif pour mettre en place une base de données recensant les actions vers le grand public
• Identifier et analyser les freins aux changements ; propositions éventuelles

Profil du candidat :
Stage ingénieur ou M2

Formation et compétences requises :
• Cursus ingénieur ou titulaire d’un Master 1 en information et médiation scientifique et technique
• Une maîtrise des outils de bureautique
• Une bonne connaissance ou une appétence pour la programmation
• Une connaissance du contrôle de version Git serait un plus

Pour ce stage, la personne devra présenter de bonnes capacités relationnelles (nombreux interlocuteurs à rencontrer pour collecter les informations), d’organisation et de rigueur (gestionnaire de contrôle de version Git, un peu de programmation à adapter à partir de code existant) et rédactionnelle (rédaction de fiches et de guides). Une documentation sera systématiquement associée à chaque production. L’anglais est la langue du monde de la recherche ; le niveau B2 minimum est demandé, C1 serait apprécié afin de pouvoir échanger avec de nombreux collègues non francophones.

Adresse d’emploi :
CRNL
Bâtiment 462 Neurocampus Michel Jouvet – Bureau F07C
95, boulevard Pinel – 69675 Bron cedex

Document attaché : 202308251030_2023_offre_stage_6mois_M2_OS_v3_DEFINITIVE.pdf

Post-doctorant : deep learning géométrique pour la complétion de surfaces

Offre en lien avec l’Action/le Réseau : – — –/– — –

Laboratoire/Entreprise : GREYC UMR CNRS 6072
Durée : 22 mois
Contact : olivier.lezoray@unicaen.fr
Date limite de publication : 2024-02-01

Contexte :
Dans le cadre du projet COSURIA (COmpletion de SURfaces par Intelligence Artificielle), le post-doctorat devra concevoir des méthodes et des algorithmes permettant de compléter la géométrie et les couleurs sur des maillages colorés en 3D.

Sujet :
Le post-doctorant aura la responsabilité principale de :
– Réaliser une bibliographie sur la complétion de maillages 3D par modèles génératifs
– Concevoir et mettre en oeuvre une méthode de complétion de la géométrie et de la couleur de maillages 3D à partir d’auto-encodeurs
– Appliquer la méthode développée pour la complétion de maillages 3D couleur de scans de personnes

Profil du candidat :

– Solide dossier de publication dans le domaine de la vision artificielle et/ou de l’apprentissage profond
– Connaissance approfondie de l’apprentissage automatique et des méthodologies d’apprentissage profond
– Maîtrise de Python (en particulier des cadres d’apprentissage profond) et éventuellement de la programmation C++.
– Capacité à rédiger des rapports scientifiques et à communiquer les résultats de la recherche lors de conférences en anglais.

Formation et compétences requises :
– Doctorat en intelligence artificielle ou en informatique
– Master ou diplôme d’ingénieur dans un domaine lié à l’informatique, ou aux mathématiques appliquées.

Adresse d’emploi :
Le laboratoire GREYC (UMR CNRS 6072) est une Unité Mixte de Recherche en sciences du numérique sous la tutelle de l’ENSICAEN, du CNRS et de l’Université de Caen Normandie (UNICAEN). Les travaux seront effectués au sein de l’équipe Image dont les activités de recherche sont centrées sur le développement de nouvelles méthodes de traitement et d’analyse de signaux/images/vidéos.
Le poste se situe dans un secteur relevant de la protection du potentiel scientifique et technique (PPST), et nécessite donc, conformément à la réglementation, que votre arrivée soit autorisée par l’autorité compétente du MESR.

Document attaché : 202308240922_COSURIA_PostDoc.pdf

Chaire de Professeur Junior « IA et Transition Numérique Industrielle »

Offre en lien avec l’Action/le Réseau : – — –/– — –

Laboratoire/Entreprise : LISIC / ULCO
Durée : 5
Contact : gilles.roussel@univ-littoral.fr
Date limite de publication : 2023-09-09

Contexte :
Contexte scientifique
Le laboratoire LISIC est une unité de recherche de l’Université du Littoral Côte d’Opale. Il est installé sur un site principal situé à Calais et une extension plus récente située à Saint-Omer. Le LISIC a défini son identité autour de la thématique de Jumeaux Numériques pour des systèmes environnementaux ou industriels et ses équipes contribuent à différents niveaux de ce thème : modélisation, perception de l’environnement, analyse, fusion d’informations, complétion de données multidimensionnelles, optimisation, synthèse d’images. Ses activités concernent les deux sections scientifiques (27 et 61).
Beaucoup des théories de l’IA font parties des activités menées au LISIC. Les besoins de la transition numérique industrielle rejoignent les objectifs théoriques : analyser, interpréter, comprendre et décider en développant des méthodes et des outils d’apprentissage, de raisonnement, d’optimisation et de décisions explicables.
Les applications du LISIC sont actuellement majoritairement tournées vers l’environnement naturel ou anthropisé. Le laboratoire souhaite continuer à développer aux meilleurs niveaux internationaux son positionnement vers les applications industrielles.
Le LISIC participe au CPER CornellIA de l’alliance A2U sur le domaine de l’IA dans lequel sont définis 4 axes. Au-delà des aspects théoriques dans lesquels le laboratoire s’inscrit, le troisième axe vise à irriguer d’autres disciplines dont l’un des domaines est l’industrie. Le 4ème axe concerne la création d’un pôle régional de compétences en IA visant l’apport de résultats de recherche des laboratoires vers les entreprises. Le LISIC prend sa part dans ces objectifs.

Conditions financières

– Traitement brut : 55 k€/an pendant 5 ans.
– Soutien financier pour mener à bien le projet de recherche et d’enseignement (financement de l’ ANR) : 200 k€ dont 60 % destiné à des charges de personnel.
– Moyens additionnels de l’université (ULCO) : 1 cofinancement de thèse.

Après évaluation des réalisations scientifiques et des capacités professionnelles du titulaire de la chaire par une commission de titularisation, celui-ci pourra prétendre à un poste de professeur titulaire.

Date limite de dépôt des candidatures (Galaxie): 8 septembre 2023

Sujet :
Mots clés : Intelligence artificielle, Industrie du futur, jumeau numérique industriel, Machine Learning, Aide à la décision.

Projet de recherche :
La chaire de Pr junior devra s’inscrire dans la thématique IA et jumeau numérique appliquée à l’industrie 4.0 avec les contraintes de l’IoT. Les recherches porteront dans la définition de modèles d’aide à la décision pour des systèmes manufacturiers, de production d’énergie, de transport, de distribution ou de communications, etc … dans le but de prédire leur comportement et d’optimiser leur performance et/ou accroître leur sécurité.
Dans le contexte de la transition numérique pour les usines du futur et de l’industrie 4.0, de la transition énergétique, de l’exploitation en ligne et d’un environnement instrumental dense sur le modèle de l’Internet des objets, les données possèdent les caractéristiques « 3V » du Big Data : volume, variété et véracité. Ces contraintes influenceront les choix pour le développement d’algorithmes d’IA pour certains des buts suivants :
– améliorer les performances en termes de sécurité, fiabilité, résilience, maintenance, disponibilité, etc ..
– développer des modèles à base d’apprentissage automatique pour la prédiction et l’optimisation des systèmes évolutifs, le diagnostic de défauts des systèmes dynamiques hybrides de grande taille
– combiner les types de modèles pour l’aide à la maintenance prédictive

Projet d’enseignement :

Le Pr junior sera pleinement investi dans le développement d’une chaire d’enseignement de l’IA pour la transition numérique de l’industrie, dans le contexte du CMQ Industrie et Transition Numérique du territoire et de la candidature au Campus d’Excellence de ce même territoire et son orientation vers l’efficience écologique industrielle. Il serait impliqué dans :

– le pilotage des modules disciplinaires à destination d’étudiants de la spécialité Génie Industriel de l’école d’ingénieurs (EIL-Côte d’Opale) et du Master Ingénierie des systèmes complexes (MISC) ;
– la participation au projet pédagogique dans le contexte d’une projection décennale de la spécialité du site dans le domaine de l’industrie 4.0 et ce, en lien avec les objectifs du projet recherche ;
– la participation à l’encadrement d’étudiants entre autres pour des projets d’innovation et de conception sur le thème de l’IA et l’industrie numérique ;
– l’intervention dans des séminaires scientifiques et l’encadrement d’étudiants en stage dans le cadre de la formation à et par la recherche.

Profil du candidat :
Le candidat doit être titulaire d’une thèse de doctorat en automatique ou en informatique avec une solide expérience dans les domaines de l’intelligence artificielle tels que l’apprentissage automatique, la prise de décision automatique. L’excellence du candidat doit se traduire par une production scientifique significative (publications dans des revues à comité de lecture de premier plan, communications dans des conférences internationales à comité de lecture de premier plan dans son domaine).

Formation et compétences requises :
Le candidat doit être capable de gérer des activités de recherche, de diriger des projets de recherche nationaux et internationaux et de superviser de jeunes chercheurs. Le candidat doit faire preuve d’aptitudes au travail en équipe.

Adresse d’emploi :
LISIC Calais & Saint-Omer – Ecole d’Ingénieurs du Littoral – Côte d’Opale (Eil-CO)

Document attaché : 202308231321_CPJ LISIC-ULCO 2023 v2.pdf

11:47 Debating the potential of machine learning for astronomical surveys (#2)

Date : 2023-11-27 => 2023-12-01
Lieu : Institut d’Astrophysique de Paris, Paris, France

Abstract deadline: August 31st 2023, Registration deadline please see note below

Machine learning techniques are developing rapidly. After our highly successful 2021 meeting, “Machine Learning for astronomical surveys”, an avalanche of new data and an ever-growing use of ML in astronomical surveys clearly mandated a follow-up.

This year’s meeting follows the same format as previously: a series of invited summary talks, short and very lively contributed talks, posters and debates. Our aim is the same as before: to cast a critical eye on the application of machine-learning techniques in astronomical surveys, including field-level inference, likelihood free approaches, generative models.

The need for new data-analysis techniques for next-generation surveys is no longer in doubt, but the applicability of these techniques (often developed outside of astronomy) needs to be questioned more than ever. We aim to make this conference a forum for discussions on problems
and solutions in data analysis of astronomical surveys. We are also interested in emerging class of problems in astronomy that mandate an evolution of our data analysis techniques.

This time, to bring together as many people as possible, while limiting our carbon impact, this conference will be organised in two locations simultaneously, at the IAP in Paris and the Flatiron Institute in New York, with speakers and audiences at both sites. A professional production company will provide high-resolution live streaming video.

While the main conference and debates will be held simultaneously (in the afternoon in Paris and in the morning in NY), the time-shifted period (morning in Paris and afternoon in NY) will be devoted to in depth review talks by leading experts in the field including:

Miles Cranmer (Cambridge University)
Marylou Gabrié (CMAP, Polytechnique)
Tomasz Kacprzak (ETH Zurich)
Jens Jasche (Stockholm University)
Soledad Villar (JHU)
Tiziana DiMatteo (CMU)

The conference will features three debates, organised jointly between Flatiron and IAP hosted simultaneously:

What can machine-learning do for the next generation surveys?
What is the impact of large language models in astronomy?
Is there truth in latent space?

These debates will be led by experts in machine-learning and/or surveys, with a wide range of views, which will certainly lead to lively discussions, including:

Nabila Aghanim (IAS, Orsay)
Pierre Casenove (CNES)
Aleksandra Ciprijanovic (Fermilab)
Helena Domínguez Sánchez (CEFCA)
David Hogg (NYU / Flatiron Institute)
Kyunghyun Cho (NYU)
François Lanusse (LCS, CEA)
Luisa Lucie-Smith (MPA)
Henry Joy McCracken (IAP, Sorbonne Université)
David Spergel (Simons Foundation)
Licia Verde (ICC-UB)
Lawrence Saul (Flatiron Institute)
Torsten Ensslin (MPA)

More information can be found at the conference website, please note the preliminary timetable.

Please register to the correct node that you want to attend to. Payment for the Paris node participation will only be called for by mid-September.

Registration closes for New York / Flatiron on September 30th, 2023. Abstract submission closes on August 31th 2023.

Lien direct

Notre site web : www.madics.fr
Suivez-nous sur Tweeter : @GDR_MADICS
Pour vous désabonner de la liste, suivre ce lien.

Trajectoires du bien vivre et bien vieillir sur son territoire

Offre en lien avec l’Action/le Réseau : – — –/Doctorants

Laboratoire/Entreprise : LIUPPA
Durée : 3 ans
Contact : sebastien.laborie@iutbayonne.univ-pau.fr
Date limite de publication : 2023-09-10

Contexte :
Bien Vivre et Bien Vieillir dans son territoire représente un enjeu reconnu par les instances du Conseil départemental des Landes, de la communauté de communes MACS et du Syndicat Mixte de St Geours de Maremne en charge de l’aménagement et de la gestion du Parc d’Aménagement Atlantisud.
On estime qu’à l’horizon 2050, il y aura 30 000 Landais et Landaises dépendants alors qu’ils étaient 17 000 en 2015.
Fort de ce constat, notre projet a pour objectif de contribuer à l’amélioration du cadre de vie via la digitalisation. Il s’appliquera au domaine du bien vivre et bien vieillir sur le territoire des Landes. Le cadre de vie peut aller de l’entreprise pour les actifs, à l’habitat en passant par les services, les loisirs… L’idée principale est de faire parler les données multi-sources, massives et hétérogènes issues de partenaires afin d’une part de les analyser, mais surtout de mettre en évidence des manques et des recommandations afin de mieux vivre sur son territoire. Ces partenaires sont par exemple le Département des Landes sur des données Habitat, mobilité, ou déplacement, ou encore le bailleur social XL Habitat, Hubics sur les données du bâtiment… Le tout complété par de l’OpenData. Les données seront ingérées, stockées et analysées dans un Data Lake éco-responsable en cours de mise en place.

Sujet :
L’objectif de cette thèse est de proposer des analyses de données ainsi que des outils de visualisation permettant d’aider les décideurs locaux dans leur appréhension et l’évolution du territoire. Sur la base des données collectées (massives et hétérogènes), l’objectif est de tracer des trajectoires sémantiques représentant les activités des citoyens (déplacements, accès aux zones de tourisme, de santé, logement, achats, etc.) y compris les évolutions dans le temps avec des prospectives pour le futur.

Profil du candidat :
– Très bon niveau en programmation.
– Connaissance des technologies du Web Sémantique.
– Analyse de données (massives et hétérogènes).
– Bon niveau en anglais.

Formation et compétences requises :
– Master de Recherche en Informatique.
– Curiosité et force de proposition sont des qualités requises.

Adresse d’emploi :
Technopôle Domolandes (St Geours de Maremne)

Document attaché : 202308130807_PhD_LIUPPA-IRIT-Domolandes.pdf

Développement d’un outil d’analyse bibliométrique

Offre en lien avec l’Action/le Réseau : – — –/– — –

Laboratoire/Entreprise : CNAM, CEDRIC
Durée : 3 mois
Contact : sle.contact@cnam.fr
Date limite de publication : 2023-09-10

Contexte :
Le travail s’effectuera au Centre d’Etudes et De Recherche en Informatique et Communications (CEDRIC) du Conservatoire Nationale des Arts et Métiers (CNAM). Les domaines d’expertise du laboratoire CEDRIC portent sur l’ingénierie des systèmes d’information et de décision, le datamining, des bases de données avancées, etc.

Sujet :
Dans le cadre d’un travail collaboratif avec l’Université Paris 1, il s’agit d’étudier le domaine des applications intelligentes. En effet, plusieurs applications ainsi que les appareils dits ‘intelligents’ apparaissent et se développent. Ce domaine étant en plein essor, il n’est pas encore structuré. Aucun travail n’existe qui énumère, classifie et organise les domaines concernés. L’objectif est de développer un outil d’analyse bibliométrique des métadonnées des publications scientifiques disponibles dans les bases telles que Scopus afin d’analyser la terminologie existante et d’établir les typologies des domaines et des appareils intelligents.

Pour postuler, merci d’envoyer votre candidature avant le 1 septembre 2023 à sle.contact@cnam.fr. La candidature doit inclure :
– Curriculum Vitae à jour,
– Lettre de motivation,
– Relevés des notes,
– Eventuellement une ou plusieurs lettres de recommandation.

Profil du candidat :
Capacités d’analyse, capacités rédactionnelles.

Formation et compétences requises :
Bac+3/Bac+5 en Informatique
Python souhaitable, Modélisation conceptuelle, Algorithmique.

Adresse d’emploi :
2, rue Conté, Paris 75003

Bioinformatician (M/F) – Description, Storage, and Standardization of Datasets and Workflows

Offre en lien avec l’Action/le Réseau : – — –/– — –

Laboratoire/Entreprise : Institut Pasteur
Durée : 23 mois
Contact : herve.menager@pasteur.fr
Date limite de publication : 2023-10-31

Contexte :
The ShareFAIR project (PEPR “Digital Health”) aims to promote the sharing and exchange of health data and their analysis, with a focus on interoperability, reusability, and transparency.

Bioinformatic analyses are complex and rely on various tools that need to be configured and chained together. In this context, improving the reproducibility of the obtained results is of paramount importance, especially in the field of health. This is typically achieved through the design, implementation, and execution of workflows (e.g. Snakemake, Nextflow), which offer numerous advantages, such as improving the reproducibility of analyses and better tracking of data provenance.

These workflows are generally scattered across public repositories, poorly annotated, and difficult to query. Challenges, therefore, include the standardization and annotation of datasets and workflows, as well as their synthesis into interoperable, shareable, and reusable workflows.

More information here:

Bioinformatician (M/F) – Description, Storage, and Standardization of Datasets and Workflows

Sujet :
Within the scope of this project, we are seeking an engineer specialized in bioinformatics workflows, data, and knowledge engineering to contribute to the definition and implementation of standards and best practices to achieve these objectives. The successful candidate will work closely with a multidisciplinary team, including bioinformatics researchers and engineers, developers, and data management experts.

Main responsibilities:
– Identification of standards for the representation and annotation of workflows:
– Perform an in-depth analysis of existing standards such as RO-Crate, EDAM, and others that are relevant.
– Evaluate their applicability to the specific needs of the ShareFAIR project.
– Recommend and justify appropriate choices of standards for the representation and annotation of workflows.
– Construction of a knowledge base integrating the identified standards:
– Design and implement an infrastructure for the creation of a consolidated knowledge base, using the selected standards.
– Develop automated pipelines for the integration and management of data from different sources.
– Collaborate with the team to ensure the quality, consistency, and accuracy of data in the knowledge base.
– Adaptation and improvement of concepts borrowed from standards:
– Examine the scope and limitations (in terms of quality and coverage) of the identified standards.
– Propose improvements and adaptations to meet the specific needs of the ShareFAIR project.
– Implement these improvements in collaboration with the development team.
– Depending on the profile, assume the role of project manager

Profil du candidat :
Bachelor’s degree (Bac +5) in computer science or bioinformatics.

The Hub of Bioinformatics and Biostatistics and Institut Pasteur are committed to promoting gender equality, and female candidates are encouraged to apply.

Formation et compétences requises :
– Proficiency in Python and/or Java for software development.
– Solid knowledge of databases, including SQL and/or NoSQL.
– Familiarity with knowledge representation formats such as JSON and RDF.
– Understanding of ontologies and bioinformatics workflows (an advantage).
– Ability to work independently and collaborate effectively within a multidisciplinary team.
– Good communication and documentation skills.
– Proficiency in professional English.

Adresse d’emploi :
Institut Pasteur, Paris, France

Domain-specific software development in natural language

Offre en lien avec l’Action/le Réseau : – — –/– — –

Laboratoire/Entreprise : IRISA (Rennes) / ALTEN (Rennes)
Durée : 3 ans
Contact : zoltan.miklos@irisa.fr
Date limite de publication : 2023-10-31

Contexte :
As a number domains and industries go through a digital transformation, one can observe a constant creation of demand for programmers. While these industries face a shortage of available software developers, the programming tasks are very specific : they require specific domain knowledge and only a modest level of programmings skills. Even this is an important phenomenon, and a basic programming skill would be desirable for the majority of professions of the 21st century, the public education curricula do not address this problem sufficiently. Certain industries face a shortage of available software developers and this problem is likely to increase.

A number tools 1 -often based of artificial intelligence- are available to address this problem and enable people with no or little programming skills to become productive developers. Recently, a number of AI-based tools -ChatGPT, Copilot, CodeClippy 2, etc. – emerged that enable to generate code in different programming languages, out of natural language. These tools could largely improve the productivity of software developers, but to make use of these tools, one still needs competences in programming languages and an understanding of the generated code.

This thesis aims to develop methodologies and tools that can enable or support do- main specialists to engage in activities that result executable software. Specifically, we envisage that they not only describe their programming tasks in natural language but they test, and debug their software in natural language, without interacting with the code itself. We would like to develop tools and methodologies to realise this vision in two different use cases.

Sujet :
We would like to develop a methodology to develop domain-specific applications in natural language. The methodology should include the following aspects :

Program synthesis: Generating code out of natural language descriptions to a specific target environment

Guiding the developer in the writing phase : We would like to develop methods to guide the developer to improve the provided textual description of the task if the provided text description is not sufficiently precise, to generate a code.
https://cacm.acm.org/news/263950-no-code-ai-platforms-and-tools/fulltext
https://github.com/CodedotAl/gpt-code-clippy/wiki

Guiding the developer in the testing/debugging phase: We will develop methodologies to correct the generated program, without specific coding skills. In particular, if the the developers discover some unexpected behavior in the executed code, they should be able to modify their description. For this, they also need guidance on how to change the original text. Potentially, they interact with a visual representation of the code rather than the original text, but they should be able to change the code to correct the behavior of their software.
We plan to develop methodologies and tools for two use cases: autonomous vehicles simulation software testing. In both of the scenarios, the goal is to develop simple software with low complexity that requires only basic programming skills, but specific domain knowledge.

Autonomous vehicles test scenarios
In this use case, we will focus on the use case of autonomous vehicles, where one needs to develop test scenarios for the driving licence of the autonomous vehicles. These sce- narios are described in a well-defined, standard language the OpenScenario 3 and Open- Road 4. These scenarios can be executed using a scenario execution software, that gene- rates a visual presentation of the defined scenario.

Software testing
In this use case we would like to develop methods and tools to support software testing. The goal is to obtain executable test scripts out of natural language descriptions of test scenarios. If the resulting test script does not correspond to the intended scenario, we user should have guidance and suggestions how to modify the text input describing the task to get the desired results.

Research questions
Program synthesis [8] is a research domain that aims to develop methods that can synthesize executable code out of high level descriptions and domain specific languages (DSLs). Researchers have proposed a variety of methods, including the use of satis- fiability or SMT solvers, reasoners, and also evolutionary computing. The most recent and advanced methods are based on the technique of neurosymbolic programming [4]. These techniques enable to combine symbolic methods to assure that the hard (and soft) constrains that correct synthesized software are satisfied, with (neural network-based) machine learning. Some important contributions in this area include [2], [5], [6], [13], [14], [16], [1] , [10]. Some of the neurosymbolic programming systems are available as open source projets, including Dreamcoder (Ellis et al. [7]).

Our planned work will use neurosymbolic techniques. While these methods enable to

realise powerful tools, they do not address several points that are very important in our context :

https://www.asam.net/standards/detail/openscenario/
https://github.com/The-OpenROAD-Project

Interaction. We would like that the user can interactively influence the generation process. While some papers propose interactive synthesis, such as [18], they assume that the developer understands the synthesized code, while we would like that the interaction is based on natural language. Phrases in natural language could have to much ambiguity to define programming tasks. When we would like to guide the programmer we might need to rely on a different representation. This could be for example a description of the scenario in a controlled language [12], or other representation that is easy to understand. We would like to avoid the developer has to read the code itself.
Guiding the expert in the programming phase can require a number of methods, including the identification of ambiguous parts of the programs. Other techniques could involve proposing auto-completion techniques. Auto-completion techniques are widely used in different areas such as in information retrieval [3], in (graph) databases [17]. We propose specific auto-completion mechanisms for this form of software development. In this context, auto-completion should take into account the specific constraints of the domain. In our work, we would like to enable developers to define certain domain knowledge in the form of constraints. We would like to exploit these constraints to generate the auto-completion options. Methods for generating auto-completion suggestions -in the presence of constraints- might in- include probabilistic reasoning [15] or machine learning-based techniques. Examples of the use of these techniques in other domains include [9] or [11].
Bibliographie
J. Austin, A. Odena, M. Nye, M. Bosma, H. Michalewski, D. Dohan, E. Jiang,C. Cai, M. Terry, Q. Le, and C. Sutton. Program synthesis with large language models, 2021.
R. Bunel, M. J. Hausknecht, J. Devlin, R. Singh, and P. Kohli. Leveraging grammar and reinforcement learning for neural program synthesis. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30- May 3, 2018, Conference Track Proceedings. OpenReview.net, 2018.
F. Cai and M. de Rijke. A Survey of Query Auto Completion in Information Retrieval. Now Publishers Inc., Hanover, MA, USA, 2016.
S. Chaudhuri, K. Ellis, O. Polozov, R. Singh, A. Solar-Lezama, and Y. Yue. Neurosymbolic programming. Foundations and Trends® in Programming Languages, 7(3) :158–243, 2021.
T. Chen, S. Kornblith, M. Norouzi, and G. Hinton. A simple framework for contras- tive learning of visual representations. In Proceedings of the 37th International Conference on Machine Learning, ICML’20. JMLR.org, 2020.
M. D. Cranmer, A. Sanchez-Gonzalez, P. W. Battaglia, R. Xu, K. Cranmer, D. N. Spergel, and S. Ho. Discovering symbolic models from deep learning with inductive biases. In H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems 33 : Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020.
K. Ellis, C. Wong, M. I. Nye, M. Sablé-Meyer, L. Morales, L. B. Hewitt, L. Cary,A. Solar-Lezama, and J. B. Tenenbaum. Dreamcoder : bootstrapping inductive program synthesis with wake-sleep library learning. In S. N. Freund and E. Yahav, editors, PLDI ’21 : 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation, Virtual Event, Canada, June 20-25, 2021, pages 835–850. ACM, 2021.
S. Gulwani, O. Polozov, and R. Singh. Program synthesis. Foundations and Trends® in Programming Languages, 4(1-2) :1–119, 2017.
N. Q. V. Hung, M. Weidlich, N. T. Tam, Z. Miklós, K. Aberer, A. Gal, and B. Stantic. Handling probabilistic integrity constraints in pay-as-you-go reconciliation of data models. Information Systems, 83 :166 – 180, 2019. http://www.sciencedirect. com/science/article/pii/S030643791830320X.
N. Jain, S. Vaidyanath, A. Iyer, N. Natarajan, S. Parthasarathy, S. Rajamani, and R. Sharma. Jigsaw : Large language models meet program synthesis. In Proceedings of the 44th International Conference on Software Engineering, ICSE ’22, page 1219–1231, New York, NY, USA, 2022. Association for Computing Machinery.
K. Kikuchi, E. Simo-Serra, M. Otani, and K. Yamaguchi. Constrained graphic layout generation via latent optimization. In Proceedings of the 29th ACM International Conference on Multimedia, MM ’21, page 88–96, New York, NY, USA, 2021. Association for Computing Machinery.
T. Kuhn. A Survey and Classification of Controlled Natural Languages. Computational Linguistics, 40(1) :121–170, 03 2014.
A. Murali, A. Sehgal, P. Krogmeier, and P. Madhusudan. Composing neural learning and symbolic reasoning with an application to visual discrimination. In L. D. Raedt, editor, Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI 2022, Vienna, Austria, 23-29 July 2022, pages 3358–3365. ij- cai.org, 2022.
E. Parisotto, A. Mohamed, R. Singh, L. Li, D. Zhou, and P. Kohli. Neuro-symbolic program synthesis. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. Open-Review.net, 2017.
J. Pearl. Probabilistic reasoning in intelligent systems : networks of plausible inference. Morgan Kaufmann, San Francisco, Calif., 2009.
R. Shin, M. Allamanis, M. Brockschmidt, and O. Polozov. Program Synthesis and Semantic Parsing with Learned Code Idioms. Curran Associates Inc., Red Hook, NY, USA, 2019.
P. Yi, B. Choi, S. S. Bhowmick, and J. Xu. Autog : A visual query autocompletion framework for graph databases. Proc. VLDB Endow., 9(13) :1505–1508, sep 2016.
T. Zhang, L. Lowmanstone, X. Wang, and E. L. Glassman. Interactive program synthesis by augmented examples. In Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology, UIST ’20, page 627–648, New York, NY, USA, 2020. Association for Computing Machinery.

Profil du candidat :
very motivated, scientific curiosity, familiarity with NLP, machine learning

Formation et compétences requises :
titulaire d’un Master en Informatique (ou euivalent), très bon niveau français et anlais

Adresse d’emploi :
Univ Rennes CNRS IRISA
Campus universitaire de Beaulieu
263 Avenue du General Leclerc – Bat 12 (D267)
F-35042 Rennes Cedex
France

ALTEN
12 Rue du Patis Tatelin, 35000 Rennes

Document attaché : 202307201009_these_cifre_ALTEN_v2.pdf

MaDICS

Masses de Données, Informations et Connaissances en Sciences

Big Data - Data Science

Archives

Development of a joint comparative analysis method for multi-omics data (multi-strain/multi-conditions). Application to filamentous fungi Trichoderma reesei

CPJ/Tenure track position, AI and Climate Extremes

La science ouverte dans un laboratoire de neurosciences : analyse des changements de pratiques

Post-doctorant : deep learning géométrique pour la complétion de surfaces

Chaire de Professeur Junior « IA et Transition Numérique Industrielle »

11:47 Debating the potential of machine learning for astronomical surveys (#2)

Trajectoires du bien vivre et bien vieillir sur son territoire

Développement d’un outil d’analyse bibliométrique

Bioinformatician (M/F) – Description, Storage, and Standardization of Datasets and Workflows

Domain-specific software development in natural language