Thèse en Intelligence Artificielle dans le cadre du projet ANR IARISQ (2026-2030)

Offre en lien avec l’Action/le Réseau : – — –/Innovation

Laboratoire/Entreprise : CRISTAL UMR CNRS 9189
Durée : 36 mois
Contact : hayfa.zgaya-biau@univ-lille.fr
Date limite de publication : 2026-06-02

Contexte :
Dans le cadre du projet ANR IARISQ : https://anr.fr/Project-ANR-25-CE56-3679 : “CONCEPTION ET DEVELOPPEMENT D’UN SYSTEME D’AIDE A LA DECISION A BASE D’INTELLIGENCE ARTIFICIELLE POUR LA PREDICTION DE LA QUALITE DE L’AIR ET LA DETERMINATION DES RISQUES SANITAIRES DES PARTICULES”, nous cherchons un doctorant pour la modélisation et prévision temporelle de la composition chimique des particules atmosphériques ; et la prédiction des seuils de toxicité associés, en intégrant ces variables physico-chimiques.

Sujet :
Prédiction temporelle de la composition physico-chimique des particules atmosphériques et estimation dynamique de leurs seuils de toxicité par Intelligence Artificielle

Profil du candidat :
Titulaire d’un Master en Intelligence Artificielle, avec une bonne maîtrise de l’anglais et de solides compétences en rédaction scientifique. Une expérience de publication (article soumis et/ou publié) constitue un atout.

Formation et compétences requises :
– Formation en informatique avec spécialisation en Intelligence Artificielle (Master ou équivalent)
– Excellentes compétences en développement informatique (Python et bibliothèques associées)
– Bonne maîtrise des approches d’IA symbolique et sub-symbolique
– Expérience en modélisation et en prédiction de séries temporelles

Adresse d’emploi :
UMR CRIStAL
Université de Lille – Campus scientifique
Bâtiment ESPRIT
Avenue Henri Poincaré
59655 Villeneuve d’Ascq

Document attaché : 202604020557_Projet ANR IARISQ Sujet de thèse.pdf

Appel à contribution orale pour les workshops EXMIA et DSCHEM @MADICS 2026

Date : 2026-06-02 => 2026-06-03
Lieu : Symposium MADICS, Avignon

Bonjour,

MaDICS est un GdR informatique centré autour des “Masses de Données, Informations et Connaissances en Sciences” (https://www.madics.fr/). Orienté vers l’interdisciplinarité, MaDICS accorde une place spécifique au traitement de l’information chimique et biologique notamment au travers des actions EXMIA (https://www.madics.fr/actions/exmia/) et DSChem (https://www.madics.fr/actions/dschem/).

Nous organisons deux sessions centrées autour des thématiques du traitement de l’information chimique et biologique, et plus particulièrement sur la mise au point de modèles multimodaux et de l’explicabilité de ces modèles lors du prochain Symposium du GdR Madics qui se déroulera les 2 et 3 juin à Avignon.

La participation au workshop est gratuite, seule une inscription est nécessaire.

Vous pouvez proposer vos travaux pour un exposé oral en adressant un résumé à celine.robardet@insa-lyon.fr et sebastien.fiorucci@univ-cotedazur.fr avant le 22 avril 2025.

Nous pouvons financer la mission d’un jeune chercheur ou d’une jeune chercheuse.
Il est possible de candidater jusqu’au 8/05/2026. Vous pouvez nous envoyer vos candidatures par email.
Faire parvenir en un seul fichier pdf votre CV à jour (1 page max.) et une lettre (1 page max.) expliquant votre intérêt pour le symposium.
La bourse permettra de couvrir les frais de mission à hauteur maximale de 500€ pour participer au symposium du GDR Madics 2026.

N’hésitez pas à nous contacter à celine.robardet@insa-lyon.fr ou sebastien.fiorucci@univ-cotedazur.fr

A bientôt,

—
Les porteurs d’EXMIA et DSChem

Notre site web : www.madics.fr
Suivez-nous sur Tweeter : @GDR_MADICS
Pour vous désabonner de la liste, suivre ce lien.

Étude, conception et exploitation de modèles de Knowledge Tracing multi-sources

Offre en lien avec l’Action/le Réseau : – — –/– — –

Laboratoire/Entreprise : Laboratoire d’Informatique Fondamentale d’Orléans
Durée : 4 à 6 mois
Contact : guillaume.cleuziou@univ-orleans.fr
Date limite de publication : 2026-04-17

Contexte :

Sujet :
Le Knowledge Tracing est un domaine d’étude à l’intersection de l’Educational Data Mining (EDM), du Learning Analytics (LA) et de l’IA en Education (AIED) qui renferme un ensemble de méthodes de modélisation des connaissances d’un apprenant à partir de l’analyse de ses activités pédagogiques dans un environnement pédagogique digital. Ces modélisations sont utilisées dans des tâches de prédiction de la réussite et permettent alors de concevoir des parcours personnalisés d’apprentissage (ITS -Intelligent Tutoring Systems). Ces méthodes reposent aujourd’hui principalement sur des modèles de Machine Learning et plus particulièrement d’apprentissage profond (deep learning). Ces approches ont conduit à l’émergence du Deep Knowledge Tracing depuis les travaux de PIECH et al. (2015).

Les recherches existantes exploitent principalement les activités pédagogiques prenant la forme d’exercices, généralement dédiés à l’acquisition d’une compétence cible, dont la réussite ou l’échec aide à estimer le niveau de maîtrise de l’apprenant à cette compétence. Plus récemment des travaux proposent d’exploiter non plus seulement les exercices mais également les dialogues tuteur/apprenant issus par exemple d’un chatbot, au moyen de LLMs (SCARLATOS, BAKER et LAN 2025). Ces avancées prometteuses tirent avantage des progrès récents en IA et offrent des opportunités nouvelles en terme d’innovations dans le domaine du Knowledge Tracing.

L’objectif du stage est d’une part de dresser un état de l’art du domaine (Knowledge Tracing) et en particulier une revue des approches récentes mettant en oeuvre une exploitation des dialogues tuteur/apprenant. Il s’agira également d’étudier les solutions d’exploitation conjointe de plusieurs sources d’information (exercices, dialogues, traces d’activités, etc.) au sein d’un modèle de Knowledge Tracing unifié. Une étude expérimentale sur données réelles est attendue. Dans cette optique, le·a stagiaire collaborera avec ses encadrants et l’équipe e-INSPE :
– dans la mise en place de la collecte des données sur les formations de la plateforme
– sur l’information aux usagers concernés par ce projet de recherche> en informant des objectifs et de l’état d’avancement de son projet
– en initiant aux fondamentaux des champs concernés (knowledge tracing, apprentissage automatique, deep learning)

Ce stage pourra donner lieu à une poursuite en thèse.

Références

PIECH, Chris et al. (2015). “Deep knowledge tracing”. In : Advances in neural information processing systems 28.

SCARLATOS, Alexander, Ryan S BAKER et Andrew LAN (2025). “Exploring knowledge tracing in tutor-student dialogues using llms”. In : Proceedings of the 15th international learning analytics and knowledge conference, p. 249-259.

Profil du candidat :
Vous manifestez un intérêt pour les sciences de l’éducation.

Une expertise Moodle serait un plus mais des modalités de formation (via l’Université d’Orléans ou de Tours et Réseau Canopé) seront envisageables.

Formation et compétences requises :
Vous êtes étudiant·e en master ou en école d’ingénieur en Informatique.

Vous disposez d’une culture scientifique en Apprentissage Automatique et d’une expérience dans la mise en œuvre de modèles de Deep Learning.

Adresse d’emploi :
DT Canopé (en fonction de la domiciliation du candidat) ; réunions en présentiel à prévoir au LIFO (Orléans)

Document attaché : 202603261753_Stage_M2_2026_eINSPE_LIFO.pdf

CfP: MACLEAN: MAChine Learning for EArth ObservatioN (workshop @ECML/PKDD2026)

Date : 2026-09-7 => 2026-06-30
Lieu : Naples, Italie

MACLEAN: MAChine Learning for EArth ObservatioN

https://sites.google.com/view/maclean26

September 2026

Best paper prize sponsored by ESA

KEY DATES

Paper submission deadline: June 14, 2026
Paper acceptance notification: July 14, 2026
Paper camera-ready deadline: July 30, 2026

CONTEXT

The huge amount of data currently produced by modern Earth Observation (EO) missions has raised up new challenges for the Remote Sensing communities. EO sensors are now able to offer (very) high spatial resolution images with revisit time frequencies never achieved before considering different kind of signals, e.g., multi-(hyper)spectral optical, radar, LiDAR and Digital Surface Models.
In this context, modern machine learning techniques can play a crucial role to deal with such amount of heterogeneous, multi-scale and multi-modal data. Some examples of techniques that are gaining attention in this domain include deep learning, domain adaptation, semi-supervised approach, time series analysis and active learning.
Even though the use of machine learning and the development of ad-hoc techniques are gaining increasing popularity in the EO domain, we can witness that a significant lack of interaction between domain experts and machine learning researchers still exists.
The objective of this workshop is to supply an international forum where machine learning researchers and domain-experts can meet each other, in order to exchange, debate and draw short and long term research objectives around the exploitation and analysis of EO data via Machine Learning techniques. Among the workshop’s objectives, we want to give an overview of the current machine learning researches dealing with EO data, and, on the other hand, we want to stimulate concrete discussions to pave the way to new machine learning frameworks especially tailored to deal with such data.

TOPICS

– Supervised Classification of Multi(Hyper)-spectral data
– Supervised Classification of Satellite Image Time Series data
– Unsupervised Learning of EO Data
– Deep Learning approaches to deal with EO Data
– Machine Learning approaches for the analysis of multi-scale EO Data
– Machine Learning approaches for the analysis of multi-source EO Data
– Semi-supervised classification approaches for EO Data
– Active learning for EO Data
– Transfer Learning and Domain Adaptation for EO Data
– Interpretability and explainability of machine learning methods in the context of EO data analysis
– Bayesian machine learning for EO Data
– Dimensionality Reduction and Feature Selection for EO Data
– Graphicals models for EO Data
– Structured output learning for EO Data
– Multiple instance learning for EO Data
– Multi-task learning for EO Data
– Online learning for EO Data
– Embedding and Latent factor for EO Data
– Foundation Models for Earth Observation
– Multi-Modal approaches for EO Data
– Self-supervised learning for EO Data

INVITED SPEAKERS:

TBA

SUBMISSION

We welcome original contributions, either theoretical or empirical, describing ongoing projects or completed work. Contributions can be of two types: either short position papers (up to 6 pages including references) or full research papers (up to 10 pages including references). Papers must be written in LNCS format, i.e., accordingly to the ECML-PKDD 2026 submission format. Accepted contributions will be made available electronically through the Workshop web page.
Post-proceedings will be also published at the CCIS (Communications in Computer and Information Science) series.

WORKSHOP WEBSITE:

https://sites.google.com/view/maclean26

SUBMISSION WEBSITE:

https://cmt3.research.microsoft.com/ECMLPKDDWT2026/Track/10/Submission/Create

PC-CHAIRS

Thomas Corpetti, CNRS, LETG-Rennes COSTEL UMR 6554 CNRS, Rennes, France, thomas.corpetti@cnrs.fr
Roberto Interdonato, CIRAD, UMR Tetis, Montpellier, France, roberto.interdonato@cirad.fr
Cassio Fraga Dantas, INRAE, UMR Tetis, Montpellier, France, cassio.fraga-dantas@inrae.fr
Giuseppe Guarino, INRAE, UMR Tetis, Montpellier, France, dino.ienco@inrae.fr
Minh-Tan Pham, Univ. Bretagne-Sud, UMR 6074, IRISA, Vannes, France, minh-tan.pham@irisa.fr

Lien direct

Notre site web : www.madics.fr
Suivez-nous sur Tweeter : @GDR_MADICS
Pour vous désabonner de la liste, suivre ce lien.

CFP AI For Human Resources and Public Employment Services (AI4HR&PES@ECML-PKDD 2026)

Date : 2026-09-07 => 2026-09-11
Lieu : University of Naples Federico II

Call for Papers for the *AI4HR&PES workshop*, which will take place *in conjunction with ECML-PKDD 2026 in Naples*, Italy.

This workshop aims to explore the challenges of the contemporary job market and human resources management through data-driven solutions.

*Important Dates:*

● Paper Submission Deadline: June 5th, 2026 – AoE at 23:59

● Notification of Acceptance: June 26th, 2026 – AoE at 23:59

● Workshop date: Sept 7 or Sept 11, 2026

*Topics of Interest:*

We invite contributions on all aspects of data-driven solutions or AI in human resources and labor market contexts, including but not limited to:

● Job market analytics

● Job recommender systems

● Applications of large language models (LLMs) in HR

● Skill extraction and forecasting

● Ethical and legal issues in AI for HR management

● Inclusion of vulnerable groups in the job market

We welcome submissions from academia, industry, and government agencies, focusing on both theoretical and practical aspects of the topics mentioned above. We encourage ethical considerations to be explicitly addressed in all contributions.

Accepted paper types include:

1. Recent results published elsewhere.

2. Previously unpublished novel research results.

3. Previously unpublished position papers.

4. Previously unpublished concise surveys.

Submissions will be handled via CMT.

The submission link will be made available through the Workshop website: https://ai4hrpes.github.io/ecmlpkdd2026/

For any inquiries, please contact at: ai4hrpes.ecmlpkdd@gmail.com

Lien direct

Notre site web : www.madics.fr
Suivez-nous sur Tweeter : @GDR_MADICS
Pour vous désabonner de la liste, suivre ce lien.

Call for Participation – XAIDATA: Spring School on Explainable AI for Data-Intensive Systems, 28-29 of May 2026, ETIS lab, Cergy

Date : 2026-05-28 => 2026-05-29
Lieu : ETIS lab, Cergy, France

Chers collègues,

Je vous partage, l’appel à participation pour notre école thématique XAI Spring Sc hool on Explainable Artificial Intelligence, les 28/29 Mai 2026.

Si vous êtes en thèse : cette école est pour vous ! Inscrivez vous dès maintenant !

Si vous êtes encadrant.e. : invitez vos étudiant.e.s à y s’inscrire !

La participation sera limitée à 25 personnes.

L’inscription est gratuit (repas et pauses cafes compris) mais obligatoire. Lien inscription :

https://docs.google.com/forms/d/e/1FAIpQLSdJkblxmDX-vaFRNPQ_bhQzKicOP7OxbCvehOVCIUaTp8FsK g/viewform

———————-
Dear colleagues,

We are pleased to invite you to participate – and/or invite your graduate students to participate – in the XAI Spring School on Explainable Artificial Intelligence, hosted by the ETIS laboratory, at Cergy Paris University (Cergy, France).

This spring school brings together researchers, students, and practitioners interested in the foundations and applications of explainable AI (XAI), with a focus on systems with different data modalities.

Topics include (but are not limited to):
Explainable and interpretable machine learning
Trustworthy and responsible AI
Data-centric AI and evaluation
Applications of XAI in real-world and inconsistent systems
Explainability with RAGs and LLMs

Target audience:
Graduate students (Master/PhD), researchers, and industry practitioners working in AI, data science, or related fields.

Location: CYU Maisons SHS, Cergy Paris Université
Dates: 28-29 May 2026
Website: https://xaietis.github.io/

Participation is free of charge, including coffee breaks and lunches.
⚠️ Registration is mandatory (max 25 participants).

The program will include keynote talks, tutorials, and interactive sessions led by leading researchers in the field.

We kindly encourage you to share this announcement with colleagues and students who may be interested.

Best regards,
The organising committee,

Vassilis Christophides (ETIS, CNRS, ENSEA, CYU, France, IPAL Singapour)
Evi Pitoura (University of Ioannina, Greece)
Dimitris Kotzinos (ETIS, CNRS, ENSEA, CYU, France)
Katerina Tzompanaki (ETIS, CNRS, ENSEA, CYU, France)

Lien direct

Notre site web : www.madics.fr
Suivez-nous sur Tweeter : @GDR_MADICS
Pour vous désabonner de la liste, suivre ce lien.

AI FOR THE PROCESS INDUSTRIES

Date : 2026-06-11 => 2026-06-12
Lieu : Webinar

Paris Dauphine – PSL and SECF join their expertise to contribute to increase efficiency and competitiveness

Europe’s process industries sits at the center of the energy transition, circular economy, and strategic autonomy. Between now and 2050, competitiveness will depend on turning sustainability requirements into engineering advantage: designing circular products, deploying low carbon manufacturing at scale, and operating plants as data driven, resilient systems. Using AI tools becomes imperative.

Submission deadline – 15 April 2026

Acceptance notification – 30 April 2026

Early bird registration – 15 May 2026

Conference: 11 & 12 June 2026

We welcome submissions on the following topics and related topics as one page abstracts:

1. AI for Process Optimization & Control
• Reinforcement learning for advanced process control (APC)
• AI-enhanced Model Predictive Control (MPC)
• Real-time optimization (RTO) using machine learning
• Hybrid first-principles + ML models
• Nonlinear and multivariable process modeling
• Soft sensors and virtual analyzers
• AI for batch vs. continuous process optimization
2. Digital Twins & Simulation
• AI-enabled digital twins for plants and assets
• Physics-informed neural networks (PINNs)
• Surrogate modeling for computational fluid dynamics (CFD)
• Hybrid simulation environments
• Closed-loop digital twins for operations
• Digital twins for operator training
3. Predictive Maintenance & Asset Reliability
• AI for predictive maintenance in rotating equipment
• Anomaly detection in process data
• Remaining Useful Life (RUL) estimation
• Root cause analysis using ML
• Reliability-centered maintenance with AI
• Edge AI for condition monitoring
4. Monte Carlo Tree Search
• Retrosynthesis
• Parameters optimization
• Molecule design
• Modeling complex systems
• Formula optimization
• Material Design
5. Energy Efficiency & Sustainability
• AI for energy optimization in refineries and plants
• Carbon footprint monitoring and reduction
• AI for electrification and decarbonization strategies
• Process integration and heat recovery optimization
• AI for hydrogen production and storage
• Waste minimization and circular economy optimization
6. Safety, Risk & Compliance
• AI for process safety monitoring
• Early warning systems for abnormal situations
• Hazard identification using NLP
• Risk modeling and probabilistic AI
• AI for regulatory compliance and reporting
• Cyber-physical risk detection
7. AI in Chemical & Materials Development
• AI for catalyst discovery
• Machine learning in reaction optimization
• Autonomous laboratories (self-driving labs)
• AI-guided formulation development
• Materials informatics
• Generative models for molecular design
8. AI & Industrial Data Infrastructure
• Data governance in process industries
• Time-series modeling for industrial data
• Data quality and sensor fusion
• Edge vs. cloud AI architectures
• Industrial IoT and AI integration
• MLOps for manufacturing environments
9. Generative AI & Knowledge Systems
• Large Language Models (LLMs) for plant documentation
• AI copilots for operators and engineers
• Automated report generation
• AI for troubleshooting and knowledge retrieval
• Conversational interfaces for control rooms
10. Supply Chain & Planning
• AI for production scheduling
• Demand forecasting in process industries
• Inventory optimization
• Supply chain resilience modeling
• AI-driven logistics optimization
11. Human–AI Collaboration
• AI adoption strategies in industrial environments
• Change management for AI transformation
• Trust and explainability in industrial AI
• Operator-in-the-loop AI systems
• Workforce upskilling for AI-enabled plants
12. Advanced Methods & Research Frontiers
• Causal AI for industrial systems
• Uncertainty quantification in ML models
• Transfer learning across plants
• Federated learning for industrial sites
• Multi-agent systems for plant-wide optimization
• Graph neural networks for process networks
13. Industry Case Studies
• AI deployment in refineries
• AI in specialty chemicals production
• AI in food and beverage manufacturing
• AI in pharmaceutical manufacturing
• AI in pulp & paper operations

Kind regards,

Tristan Cazenave

Lien direct

Notre site web : www.madics.fr
Suivez-nous sur Tweeter : @GDR_MADICS
Pour vous désabonner de la liste, suivre ce lien.

GreenFieldData EU Project – PhD positions – women’s applications welcome

Offre en lien avec l’Action/le Réseau : – — –/Innovation

Laboratoire/Entreprise : LIRIS & University of Milano
Durée : 3 ans
Contact : genoveva.vargas@gmail.com
Date limite de publication : 2026-04-15

Contexte :
**********************************************************************************
GreenFieldData: agricultural practices of the future
14 EU-funded Double Diploma PhDs
https://www.eu4greenfielddata.eu/phd-positions-application/list-of-phds
**********************************************************************************

Sujet :
GreenFieldData Marie Skłodowska-Curie Project is offering 14 EU-funded double-diploma PhD positions in digital agriculture at the intersection of IoT and robotics, data engineering, data management, and data analysis. The network brings together academic and non-academic partners across several countries and offers an interdisciplinary doctoral environment with joint supervision, international collaboration, and research connected to important challenges such as climate change and low-input agricultural systems.

We would particularly appreciate your support in encouraging women to apply. We are looking for scientists who can widen how agriculture is imagined, studied, and innovated. Different bold perspectives and skills matter, and we want to help build a research community that is both vibrant and inclusive.

The positions are also framed by supportive working conditions, attention to work–life balance, and room for different personal situations, which we believe are essential for meaningful and sustainable research careers.

The application deadline is 15 April 2026 (https://www.eu4greenfielddata.eu/phd-positions-application/how-to-apply)

Please feel free to circulate the poster and the call within your networks, mailing lists, associations, and communities. Your support would be extremely valuable in helping us reach more potential candidates.

With many thanks in advance for your help and solidarity,

GreenFieldData Project Publicity Board

Profil du candidat :
We are looking for scientists who can widen how agriculture is imagined, studied, and innovated. Different bold perspectives and skills matter, and we want to help build a research community that is both vibrant and inclusive.

Formation et compétences requises :

Adresse d’emploi :
University Claude Bernard Lyon 1, Lyon, France
University of Milano, Milano, Italy

Document attaché : 202603211717_w-greenfielddata.pdf

POSTE MCF 27 – Université d’Orléans – Profil Gestion de Données

Offre en lien avec l’Action/le Réseau : – — –/– — –

Laboratoire/Entreprise : LIFO -Université d’Orléans
Durée : permanent
Contact : mirian@univ-orleans.fr
Date limite de publication : 2026-04-04

Contexte :
Dans le cadre de la campagne synchronisée de recrutement des enseignants-chercheurs 2026, l’université d’Orléans ouvre un poste de MCF avec un rattachement recherche au laboratoire LIFO

Sujet :
La personne recrutée devra renforcer en priorité l’axe Bases de données et s’intégrer au projet commun de l’équipe PAMDA (Parallélisme et Bases de Données)

Profil du candidat :

Formation et compétences requises :
Contacts recherche :

Sophie Robert (responsable de l’équipe PAMDA) : sophie.robert@univ-orleans.fr

Mirian Halfeld Ferrari (directrice du LIFO) : direction.lifo@listes.univ-orleans.fr

Contacts enseignement : Laure Kahlem (directrice du département informatique de l’UFR ST) : lkahlem@univ-orleans.fr

Adresse d’emploi :
Fiche du poste: https://www.univ-orleans.fr/upload/public/2026-03/260886_Poste_MCF_2026%2027%20UFR%20ST%20LIFO.pdf

Robust-to-noise information extraction, unifying challenges of optical character recognition (OCR) and automatic speech recognition (ASR)

Offre en lien avec l’Action/le Réseau : – — –/– — –

Laboratoire/Entreprise : La Rochelle Université – Laboratoire l3i
Durée : 36
Contact : mickael.coustaty@univ-lr.fr
Date limite de publication : 2026-04-04

Contexte :
The growing digitization of written and oral content has made Optical Character Recognition (OCR) and Automatic Speech Recognition (ASR) essential in cultural heritage preservation, media accessibility, legal documentation, knowledge management and information retrieval. However, the outputs generated by these systems are inherently noisy: OCR is affected by document degradation, layout complexity or poor scanning quality, while ASR suffers from background noise, overlapping speech or non-standard oral expressions. Despite significant progress, it remains pervasive, and imperfections directly impact natural language downstream tasks where data quality is a key prerequisite. Although OCR and ASR face many similar error phenomena, their correction has mostly been studied in isolation, resulting in a lack of unified methodologies.

Sujet :
Objectives:
• Compare and analyse existing post-correction methods in OCR and ASR and potential for cross-domain adaptation.
• Develop unified approaches for post-correction that leverage the shared error patterns between OCR and ASR.
• Enable robust information extraction from noisy OCR and ASR outputs by designing strategies that mitigate the propagation of recognition errors into downstream NLP tasks.
Scientific challenges:
• Heterogeneity of noise sources: OCR errors are generated from visual artifacts while ASR errors are acoustic, a unified framework must generalize across modalities.
• Domain adaptation: OCR/ASR models often struggle on domain-specific datasets (e.g., historical texts, administrative documents, technical reports, scientific papers…) requiring correction methods that adapt to varying contexts.
• Complex error structures: beyond character and subword substitution, OCR/ASR introduce higher-level disruptions (mis-segmentation, overlapping text blocs or speech, layout misinterpretation) that complicate correction.
• Evaluation difficulties: classical metrics such as Character Error Rate (CER) or Word Error Rate
(WER) fail to fully capture the impact of errors on downstream information extraction, that
necessitate new evaluation methods.
• Scalability: correction methods must be applicable to large-scale corpora and adaptable to new
data without full retraining.
To tackle these challenges, the thesis will explore a combination of:
• Comparative state-of-the-art analysis: systematic benchmarking of existing OCR and ASR
post-correction methods on heterogeneous corpora.
• Unified modeling approaches: leveraging neural architectures (e.g., sequence-to-sequence
models, transformers, multilingual pre-trained LLMs) that can learn correction patterns across
both modalities.
• Hybrid methods: integrating symbolic rules, edit distance algorithms, and domain-specific
lexicons with machine learning models to improve robustness.
• Error modeling and simulation: designing artificial noise injection techniques to train models on
synthetic but realistic OCR/ASR-like errors, thus improving generalization.
• Evaluation frameworks: extending standard CER/WER with task-oriented metrics reflecting the
quality of downstream information extraction and retrieval.
This thesis helps to overcome the current limitations of automatic correction of texts produced by OCR
and ASR systems by proposing a unified approach, which represents a significant scientific advance. In
fact, in-depth analysis of the similarities and differences between OCR and ASR errors will provide a
better understanding of how these two fields can intersect. This project will enable the development of
more robust methods based on multidisciplinary knowledge from natural language processing, signal
processing, and image processing. The expected results will thus offer new perspectives in the
development and use of multimodal language models, contributing to the evolution of generative AI in
both language processing and signal processing. With the rise of multimodal databases (text, image,
audio, video), this thesis could inspire the creation of tools capable of simultaneously exploiting data
from various sources to extract more relevant information. The thesis is expected to deliver a
contribution to the bridging of OCR and ASR research communities and opening new research avenues
in multimodal NLP.

Profil du candidat :
The highly motivated candidate should hold a master’s degree in computer science or a related field. She/he should have
a strong background in NLP with an interest in text processing and multimodal data (text, speech,
document images). Familiarity with generative AI methods (e.g., large language models, text-to-text
generation, deep learning, fine-tuning strategies) will form a strong asset.

Formation et compétences requises :
Master of science in computer science, ai or applied mathemics, or any equivalent diploma

Adresse d’emploi :
mickael.coustaty@univ-lr.fr

Document attaché : 202603191652_Alloc_Doc_AI_DH_Coustaty_Suire_Public.pdf

MaDICS

Masses de Données, Informations et Connaissances en Sciences

Big Data - Data Science

Archives

Thèse en Intelligence Artificielle dans le cadre du projet ANR IARISQ (2026-2030)

Appel à contribution orale pour les workshops EXMIA et DSCHEM @MADICS 2026

Étude, conception et exploitation de modèles de Knowledge Tracing multi-sources

CfP: MACLEAN: MAChine Learning for EArth ObservatioN (workshop @ECML/PKDD2026)

CFP AI For Human Resources and Public Employment Services (AI4HR&PES@ECML-PKDD 2026)

Call for Participation – XAIDATA: Spring School on Explainable AI for Data-Intensive Systems, 28-29 of May 2026, ETIS lab, Cergy

AI FOR THE PROCESS INDUSTRIES

GreenFieldData EU Project – PhD positions – women’s applications welcome

POSTE MCF 27 – Université d’Orléans – Profil Gestion de Données

Robust-to-noise information extraction, unifying challenges of optical character recognition (OCR) and automatic speech recognition (ASR)