Offre de stage/thèse GENAI for causality

Offre en lien avec l’Action/le Réseau : – — –/Doctorants

Laboratoire/Entreprise : CRAN-Université de Lorraine
Durée : 36 mois
Contact : marianne.clausel@univ-lorraine.fr
Date limite de publication : 2025-02-01

Contexte :
Causality and more generally eXplainable AI (XAI) is one of the hot current topics of the AI scientific community, with many applications in medicine, material sciences, environment, marketing…
.
We invite for applications for a PhD thesis position within the CAUSALI-T-AI project of PEPR IA project funded by the ANR (2023-2029) about generative AI for causality (more details below). The thesis will take place in the Simul Research Group of Centre de Recherche en Automatique de Nancy. International scientific collaborations with US can also be planned. We have strong connections with T. Adali’s research group who is the scientific leader of the Machine Learning for Signal Processing Laboratory in the University of Maryland (Baltimore, USA)

Sujet :
In machine learning, generative models allow one to model the probabilistic behavior of a wide range of physical systems, including applications in finance, medical imaging, climate science, among others. They allow one to perform tasks, including generating new data, but also the inference of latent (unobserved) quantities (e.g., the presence or not of a disease) and to solve other tasks such as time-series forecasting.
Recent advances in generative models have been strongly supported by advances in the field of deep neural networks, leading to great experimental performance. However, they also suffer from one of the main shortcomings of deep learning methods, namely, the theoretical understanding of their behavior which is still an open question (e.g., uniqueness of representations, approximation capability, generalization, etc.).
Such theoretical guarantees are very important for generative models when they are used for inference tasks: in applications such as in medicine, it is crucial to know that a representation learned by the model is unique or stable. Moreover, on the one hand, models that are unique such as (nonlinear) structural equation models (SEMs) can reveal the causal mechanisms of the underlying physical system. On the other hand, such results also guarantee that statistical inference or forecasting results will not suffer significant changes (which could change the result of a diagnostic) due to, e.g., different choices of initializations in an optimization algorithm, or small changes in selected hyperparameters. Discovering a unique generative model in the presence of unobserved latent factors of variation is also a cornerstone of other aspects of causal reasoning, as it is useful in the computation of counterfactuals.
The general goal of this project is to solve practical statistical inference tasks using generative models while:
1) from a modeling perspective, addressing some nonidealities in the data, such as nonstationariety, or differences between statistical distributions or acquisition conditions of different measurements (e.g., precipitation data acquired over different geographical locations, medical data from different groups of subjects, etc.),
2) from an algorithmic perspective, developing solutions that take those nonidealities into account and can provide better results when such conditions are met in practice, and 3) studying the theoretical properties of the models in a general context, including, for instance, uniqueness, stability and approximation capabilty, investigating different hypothesis that can support such results and also some choices related to the algorithm (e.g., what properties weight matrices in rank neural networks need to satisfy, what do we need to assume about the distribution of the data, etc.)

Profil du candidat :
Master Student in Computer Science/Machine Learning or Applied Math

Formation et compétences requises :
machine learning-data science

Adresse d’emploi :
Simul Research Group @ CRAN

Faculté des Sciences et Technologies

Campus, Boulevard des Aiguillettes

54506 Vandœuvre-lès-Nancy

Website : https://cran-simul.github.io/

Document attaché : 202411011605_GenAI-Causality.pdf

Aide à la correction d’anomalies dans des données multidimensionnelles et multirelationnelles sur l’agroécologie en santé animale et végétale

Offre en lien avec l’Action/le Réseau : – — –/– — –

Laboratoire/Entreprise : Laboratoire d’Informatique, Robotique et Microélec
Durée : 6 mois
Contact : alexandre.bazin@lirmm.fr
Date limite de publication : 2025-02-01

Contexte :

Sujet :
Pour réussir la transition agroécologique, les producteurs ont besoin de disposer de connaissances sur des alternatives aux techniques agricoles classiques. Cependant, en préalable à l’utilisation d’une base de connaissances (BC) par des producteurs et des experts scientifiques, celle-ci doit être corrigée de ses anomalies. Le contexte de ce stage est la BC Knomana [Silvie et al., 2021], qui rassemble 48000 descriptions d’utilisation de plantes à effet pesticide et antibiotique, et vise à proposer des préparations à base de plantes en remplacement des produits chimiques de synthèse. Des dictionnaires permettent déjà de corriger les valeurs pour ses 31 types de données. Par contre, la vérification de la correction et de la cohérence des données est trop complexe pour être réalisée manuellement. Par exemple, une incohérence entre la plante pesticide, le système protégé (e.g. culture de maïs), le bioagresseur (e.g. insecte) et la localisation géographique suffit pour induire en erreur un producteur. La technique appelée Exploration d’Attributs (EA), développée par l’Analyse de Concepts Formels, permet de détecter et de corriger ces anomalies [Saab et al., 2022] en exprimant chaque connaissance sous forme d’une règle d’implication. Les règles sont présentées aux experts qui les valident ou les invalident afin de mettre la BC dans un état cohérent.
L’objectif du stage est de développer un prototype logiciel de détection et de correction des anomalies pour des données multidimensionnelles et multirelationnelles. Ce prototype permettra de manipuler les données et les types de données, puis d’interagir avec
la libraire FCA4J, pour le calcul des règles, et le logiciel RCAvizIR, développé avec le soutien de #Digitag (stages de Master en 2022 et 2023) pour les présenter dans un ordre facilitant le travail de correction par les experts.

* Pierre Accorsi, Mickaël Fabrègue, Arnaud Sallaberry, Flavie Cernesson, Nathalie Lalande, Agnès Braud, Sandra Bringay, Florence Le Ber, Pascal Poncelet, Maguelonne Teisseire. HydroQual: Visual Analysis of River Water Quality. Proceedings of the IEEE Symposium on Visual Analytics Science and Technology (VAST), pp. 123-132, 2014.
* Daniel A. Keim, Gennady L. Andrienko, Jean-Daniel Fekete, Carsten Görg, Jörn Kohlhammer, Guy Melançon. Visual Analytics : Definition, Process, and Challenges. Information Visualization – Human-Centered Issues and Perspectives. LNCS 4950, Springer 2008, p. 154-175. * Tamara Munzner. Visualization Analysis & Design. CRC Press, A K Perters Books, 2014. * Roberto Tamassia, Handbook on Graph Drawing and Visualization. Chapman et Hall / CRC, 2013.
* Michael Sedlmair, Miriah D. Meyer et Tamara Munzner. Design Study Methodology: Reflections from the Trenches and the Stacks. IEEE TVCG 18(12): 2431-2440, 2012.

Profil du candidat :
Compétences recherchées :
Intelligence artificielle, Fouille de données, Javascript

Formation et compétences requises :
Discipline principale du projet :
Informatique, Extraction de connaissances, Visualisation

Discipline secondaire du projet :
Sciences de la Vie et de l’Environnement

Adresse d’emploi :
Université de Montpellier

Document attaché : 202411011120_Sujet de stage Digitag 2024-1.pdf

Offre de stage M2 – Projet STAY (LISIS- TETIS ) à Montpellier

Offre en lien avec l’Action/le Réseau : – — –/– — –

Laboratoire/Entreprise : TETIS
Durée : 5 mois
Contact : maguelonne.teisseire@inrae.fr
Date limite de publication : 2025-02-01

Contexte :
Ce stage s’inscrit dans les activités interdisciplinaires de l’UMR TETIS du projet STAY – Savoirs Techniques pour l’Autosuffisance, sur YouTube (financement CNRS) – en partenariat avec le LISIS (Laboratoire Interdisciplinaire Sciences Innovations Sociétés). Des pratiques agricoles sont aujourd’hui partagées et commentées sur YouTube, plateforme d’hébergement de vidéos dont la popularité n’est
plus à démontrer.En effet, en février 2023, les données Médiamétrie indiquaient 48 millions d’utilisateurs uniques en France. Disponible à tout le monde, la plateforme permet à quiconque – professionnels de l’agriculture ou pas – de devenir créateur de contenu, les caractéristiques et la qualité des informations ainsi partagées faisant l’objet d’une littérature déjà abondante. Cette littérature montre entre autres que YouTube constitue pour ses utilisateurs une source d’informations qui contribue aux appréciations qu’ils se font d’une situation, et qui peut influencer leur jugement et leur action parfois de manière significative.
Qu’il s’agisse d’utilisateurs ou de producteurs de contenu, ils peuvent être à la fois des professionnels (exploitants agricoles, Chambres d’Agriculture…) et des amateurs (des jardiniers engagés dans l’autoproduction alimentaire à l’échelle d’un potager ou petit verger, militants…). Nous nous intéressons tout particulièrement au sujet des ravageurs en production légumière et arboricole.

Sujet :
L’objectif du stage est double :
(1) dresser un inventaire le plus exhaustif possible des chaînes YouTube pouvant être consultées afin d’obtenir des informations concernant les techniques de production légumière et arboricole – avec une attention particulière aux chaînes faisant référence aux techniques de lutte contre les ravageurs – en distinguant les chaînes produites par des professionnels de l’agriculture et les chaînes alimentés par des amateurs. Il s’agira dans un premier temps d’identifier les mots-clés pertinents et d’une liste de thèmes susceptibles de faire l’objet de recherches sur YouTube
(2) réaliser de façon automatique une catégorisation des contenus, en s’appuyant sur les statistiques et métadonnées, en termes:
– d’année d’apparition
– de nombre d’abonnés, de nombre de commentaires, de nombre de vues et de nombre de likes, avec une analyse de l’évolution temporelle de ces indicateurs d’identification des repères temporels marquants pour l’apparition et l’évolution en termes de succès de ces chaînes (épidémie de Covid, des évènements climatiques significatifs, etc.)
– de production de contenu, en termes quantitatifs
– de catégories des producteurs de contenu (classification à construire) de types de contenu proposés et de thèmes abordées – relatifs aux techniques agricoles et plus particulièrement aux techniques de lutte contre les ravageurs
– de type de stratégie économique employée par les créateurs de contenu – en termes de nombre de publicités et d’autres sources de revenu (contrats, cagnotte Tipeee..).

Le/la stagiaire pourra s’appuyer sur une production académique récente (Bruhl 2023) concernant un sujet similaire, à savoir la thèse de Guillaume Bruhl intitulée « État des lieux de la vulgarisation scientifique vétérinaire francophone sur Youtube ». Les implémentations s’intégreront dans la plateforme en cours de développement du projet.

Profil du candidat :
Le/la stagiaire aura un profil en informatique avec des connaissances en traitement automatique de la langue et/ou apprentissage automatique, avec un intérêt pour le travail interdisciplinaire. Une expérience dans le langage de programmation Python est un plus.

Formation et compétences requises :

Adresse d’emploi :
500 rue JF Breton 34090 Montpellier

Document attaché : 202410290856_Distribution_Stage1_Stay2024.pdf

Post-doctoral position – Modeling high-contrast intensity observations: from data-driven calibration to the integration of physical priors

Offre en lien avec l’Action/le Réseau : BigData4Astro/– — –

Laboratoire/Entreprise : INRIA / CRAL
Durée : 12+12 months
Contact : olivier.flasseur@univ-lyon1.fr
Date limite de publication : 2025-04-30

Contexte :
The observation of the close environment of stars can reveal the presence of exoplanets and circumstellar disks, providing crucial information for a better understanding of planetary system formation, evolution, and diversity. Given the very small angular separation with respect to the host star and the huge contrast between the (bright) star and the (faint) exoplanets and disks, reconstructing images of the immediate vicinity of a star is extremely challenging. In addition to the use of extreme adaptive optics and a coronagraph, dedicated post-processing methods combining images recorded with the pupil tracking mode of the telescope are needed to efficiently suppress the nuisance component (speckles and noise) corrupting the signals of interest [1].
In recent works, we have introduced innovative post-processing methods that combine statistical modeling of the nuisance component with deep learning [2,3,4]. These models achieve state-of-the-art performance, surpassing traditional inverse-problem approaches in detecting point-like sources such as exoplanets. Simultaneously, new algorithms have been proposed to reconstruct the spatio-spectral flux distribution of circumstellar environments — composed of gas and dust forming disk structures where exoplanets form through material accretion. These reconstruction methods jointly estimate the objects of interest and the nuisance statistics using an inverse problem approach [5,6]. Although these methods demonstrate impressive reconstruction quality, there is still room for improvement, particularly near the star where disk components are most affected by starlight contamination. In addition, for both tasks (detection and reconstruction), current algorithms ignore the temporal and spatial variability of the off-axis point-spread function (PSF), affecting exoplanet detection sensitivity, astro-photometric accuracy, and the spatial resolution of the disk reconstructions.
In this context, data science developments are decisive to improve the fidelity of circumstellar disk reconstruction, especially for fine and faint structures at short angular separations. These advances will also support future instruments by allowing the design of algorithms addressing scientific challenges outlined in the Extremely Large Telescope (ELT) roadmap, using realistic simulations of astrophysical scenes.

Sujet :
Research objectives: This postdoctoral project will build on recent advancements by our research team in modeling the nuisance component that corrupts high-contrast total intensity observations. The focus will be on reconstructing circumstellar disks and modeling the signal degradation caused by the measurement process. The key research objectives include:
– Integrating deep models of the nuisance component into algorithms dedicated to circumstellar disk reconstruction in total intensity, potentially inspired from deep models we have developed for exoplanet detection.
– Incorporating prior information about typical flux distributions in circumstellar environments observed in total intensity. This will involve using dedicated simulators and combining this information with advanced nuisance models in the reconstruction algorithms.
– Addressing the spatio-temporal variability of the off-axis PSF. Two open research directions could be explored:
*Exploiting metadata, such as adaptive optics telemetry, to track instrumental response variations due to changing observing conditions.
*Investigating data-driven approaches to model this variability directly from the science data.
Whenever possible, raw sensor data will be considered rather than pre-processed data to better quantify signal degradation from both the measurement and processing stages, and to model and propagate uncertainties end-to-end. This process will involve calibrating and assembling raw data using inverse-problem methods developed in the DDISK ANR project (PI: Maud Langlois). While complementary, the priorities of these research objectives can be adjusted based on the applicant’s expertise.

Data and Instruments: The project will focus on developing new processing algorithms using total intensity observations (imaging and spectroscopy, i.e., spatio-temporal-spectral data) from the SPHERE instrument, currently operating on the Very Large Telescope. Once a proof of concept is established, simulations for HARMONI, one of the first-light instruments of the upcoming ELT, may be considered. The algorithms will then be adapted to account for HARMONI’s specific features, particularly its higher spectral resolution.

Profil du candidat :
Collaboration and Location: The postdoc will be part of a multidisciplinary collaboration. She/he will collaborate with Jean Ponce (ENS-PSL, Paris), Julien Mairal (INRIA, Grenoble) and Olivier Flasseur (CRAL, Lyon). Additional collaborations would involve experts in observational astrophysics, including Maud Langlois (CRAL, Lyon) and Anne-Marie Lagrange (LESIA, Paris). The postdoc will also collaborate with a third-year PhD student at INRIA. The postdoc will be based primarily at INRIA, with regular visits at CRAL.

Duration: The initial appointment is for one year, with a possible one-year extension (with other sources of fundings).

Desired Skills and Expertise: The candidate should hold a PhD in signal and image processing, applied mathematics, machine learning, computer vision and related fields. A strong interest in physics, pluri-disciplinary research and scientific applications is a plus.

Deliverables: The developed algorithms will be disseminated in peer-reviewed journals and relevant conferences in the fields of astronomy and computer science. The associated code will be made public in the time-line of the position.

Contacts and Application Process: Applicants should send the following documents to Jean Ponce (jean.ponce@ens.fr), Julien Mairal (julien.mairal@inria.fr), and Olivier Flasseur (olivier.flasseur@univ-lyon1.fr): a CV outlining qualifications and previous experiences, a cover letter detailing research interests, a list of publications, and a list of up to three referees ready to write a recommendation. Requests for additional information on the position can be sent directly by email, and a video-conference could be arranged. Applications will continue to be reviewed until the position is filled.
This position falls within a sector subject to the protection of scientific and technical potential (PPST) and therefore, in accordance with regulations, the applicant’s arrival must be authorized by the competent authority of the Ministry of Higher Education and Research (MESR).

Formation et compétences requises :
Desired Skills and Expertise: The candidate should hold a PhD in signal and image processing, applied mathematics, machine learning, computer vision and related fields. A strong interest in physics, pluri-disciplinary research and scientific applications is a plus.

Adresse d’emploi :
INRIA (Paris or Grenoble), close collaborations with CRAL (Lyon)

Document attaché : 202410281051_Sujet PostDoc PEPR.pdf

Representation of physical quantities on the Semantic Web

Offre en lien avec l’Action/le Réseau : – — –/– — –

Laboratoire/Entreprise : Mines Saint-Étienne/LIMOS
Durée : 5-6 mois
Contact : antoine.zimmermann@emse.fr
Date limite de publication : 2025-01-31

Contexte :

Sujet :
Physical quantities form an important part of what is represented in scientific data, medical data, industry data, open data, and to some extent, various private data.

Whether it is distances, speeds, payloads in transportation, concentrations, masses, moles in chemistry, powers, intensities, voltages in the energy sector, dimensions of furniture, weights, heights of people, durations, and many others in health, there is a need to represent physical quantities, to store them, to process them, and to exchange them between information systems, potentially on a global scale, often on the Internet and via the Web.

In this internship, we seek to precisely define a way to unambiguously represent physical quantities for the Web of Data. More precisely, we will study the proposals made to encode physical quantities in the standard data model of the Semantic Web, RDF. We will be particularly interested in the use of a data type dedicated to this encoding, probably adapted from the proposal of Lefrançois & Zimmermann (2018) based on the UCUM standard.

Having established a rigorous definition of the data type (possibly its variants, if relevant), we will focus on implementing a module that can read/write and process physical quantities and their operations within the RDF data manipulation APIs, for the management, querying and reasoning with knowledge graphs containing physical quantities.

The ambition is that, on the one hand, the specification will become in a few years a de facto standard, before perhaps becoming a de jure standard; and that, on the other hand, the implementation will be the reference allowing to compare the compliance levels of other future implementations.

This study should lead to the publication of a scientific paper in a high impact scientific journal.

References

– Maxime Lefrançois and Antoine Zimmermann (2018). The Unified Code for Units of Measure in RDF: cdt:ucum and other UCUM Datatypes. In The Semantic Web: ESWC 2018 Satellite Events – ESWC 2018 Satellite Events, Heraklion, Crete, Greece, June 3-7, 2018, Revised Selected Papers, volume 11155 of the Lecture Notes in Computer Science, pp196–201, Springer.
– Gunther Shadow and Clement J. McDonald. The Unified Code for Units of Measure. Technical report, Regenstrief Institute, Inc, November 21 2017.

Profil du candidat :
Equivalent of a M2 level in CS, with knowledge of Semantic Web technologies. Also, the candidate must have either very good programming skills in Java, or very good aptitude in formal and abstract thinking.

Formation et compétences requises :

Adresse d’emploi :
Mines Saint-Étienne
158 cours Fauriel
42023 Saint-Étienne Cedex 2

L’institut ACSS de l’Université PSL recrute 1 ingénieur de recherche en science des données pour les sciences sociales

Offre en lien avec l’Action/le Réseau : SimpleText/– — –

Laboratoire/Entreprise : Université de Paris-Dauphine – PSL
Durée : 1 an renouvelable
Contact : bruno.chavesferreira@dauphine.fr
Date limite de publication : 2025-02-25

Contexte :
Créé au sein de l’Université Paris Sciences et Lettres (PSL) et hébergé à Paris Dauphine, l’Institut « Applied Computational Social Sciences » a pour vocation de renforcer la recherche sur les grandes problématiques sociétales (cohésion politique et sociale, transition écologique, transformation numérique, efficacité et compétitivité économique) en articulant sciences des données et sciences sociales.

L’Institut collecte et traite à large échelle des données hétérogènes tant pour permettre des avancées scientifiques que pour contribuer à éclairer le débat public et la décision. Il réunit sur une équipe pluridisciplinaire de chercheurs en sciences sociales et s’appuie sur une équipe d’ingénieurs en sciences de données qui apportent leurs expertises pour constituer des bases de données originales et opérer des traitements complexes. Ces projets sont initiés et portés par des laboratoires du CNRS, de Dauphine, de l’ENS, de l’INSP et des MinesParis-Tech. Les résultats des travaux ont vocation à être largement diffusés auprès de partenaires institutionnels et du monde économique.

Sujet :
Dans le cadre du développement de l’Institut ACSS, l’Université PSL recrute un ingénieur de recherche (IR) en science des données. Elle/il sera chargé(e) de mettre en œuvre des méthodes et outils de collecte et traitement de données issues de sources variées (Web, bases de données institutionnelles, archives, etc.). Elle/il aura également la responsabilité de veiller au respect des bonnes pratiques en matière de développement et de gestion du code et des données. Enfin, elle/il contribuera au développement de modèles statistiques ou d’apprentissage automatique (notamment dans le domaine du traitement automatisé de la langue naturelle).

Profil du candidat :
Diplôme d’ingénieur en informatique (ou mathématiques/statistiques) avec au moins 3 ans d’expérience ou titulaire d’un doctorat avec une thèse dans le domaine.

Formation et compétences requises :
Expérience dans le développement de réseaux de neurones profonds et autres modèles statiques avancés appliqués au traitement automatisé du langage sur de larges corpus.
Maîtrise des écosystèmes Python et/ou R dédiés à la science des données.
Plus spécifiquement en Python, maîtrise de numpy, pandas, pytorch et l’environnement hugging face.
En R, maîtrise du tidyverse, de tidymodels et des bibliothèques associées, ainsi que de torch.
Maîtrise des bases de données relationnelles et NoSQL.
Compréhension des méthodes scientifiques des sciences humaines et sociales.

Adresse d’emploi :
Université de Paris Dauphine – PSL
Pl. du Maréchal de Lattre de Tassigny, 75016 Paris

Document attaché : 202410250904_Ingenieur_IR_ACCS_2024_2_fr.pdf

Échantillonnage de motifs sur des donnéeshétérogènes

Offre en lien avec l’Action/le Réseau : – — –/– — –

Laboratoire/Entreprise : Laboratoire GREYC Université de Caen
Durée : 6 mois
Contact : abdelkader.ouali@unicaen.fr
Date limite de publication : 2025-02-25

Contexte :
Ce stage de master s’inscrit dans le cadre du projet FIDD (Facilitated Exploration :Interactive Constraint-Driven Data Mining) financé par l’ANR (Agence Nationale de laRecherche), projet qui démarrera en février 2025. L’objectif principal du projet FIDD est d’améliorer l’expérience de l’utilisateur dans la boucle interactive de fouille de données en exploitant les contraintes pour capturer ses intérêts et guider efficacementle processus de fouille de données. Ce projet regroupe 6 organismes de recherche nationaux : LISN [UMR 5506 – Université de Paris-Saclay], LIRMM [UMR 5506 – Université ofMontpellier], LS2N [IMT Atlantique Nantes], GREYC [UMR 6072 – Université de Caen],LIFO [EA 4022 – Université d’Orléans], CRIL [UMR 8188 – Université of Artois]. En collaboration avec l’entreprise PME Deeplink-Medical, une application phare est considéréeafin d’améliorer la prise en charge des patients par des médecins radiologues selon leursinteractions.

Sujet :
La fouille de motifs [1] consiste à extraire, à partir d’un ensemble de données, des régu-larités ou des modèles récurrents qui peuvent être utilisés pour générer des connaissancessignificatives. Cependant, afin de réduire la durée de la procédure et donner davantage de contrôle à l’utilisateur, la fin des années 2000/le début des années 2010 voyait le déve-loppement des méthodes defouille interactive[7] : à chaque itération, un petit ensemblede motifs est proposé à l’utilisateur, l’utilisateur examine ces résultats partiels, donne desretours que l’algorithme prend en compte pendant la ou les prochaines itérations. En rai-son du très grand nombre de motifs extraits, une telle approche a pourtant besoin destechniques d’échantillonnage en sortie de motifs comme celles été proposées en [4, 5, 6, 3]pour sélectionner un sous-ensemble représentatif de l’ensemble de motifs. Ces techniquespermettent de réduire la complexité en temps de calcul et de faciliter l’analyse tout en pré-servant l’essence des informations contenues dans les motifs de la base de données. Dansces techniques, le tirage de motifs est souvent réalisé proportionnellement à une mesure re-flétant un certain intérêt de l’utilisateur. Ainsi, le processus d’échantillonnage peut intégrerdes contraintes visant à influencer le tirage lui-même ou à cibler spécifiquement des motifsqui satisfont certaines propriétés définies. Plus précisément, ce problème d’échantillonnagese formule de la manière suivante [4, 2] : étant donné une base de données S, un langage demotifsL, un ensemble de contraintes C, et une mesure de qualité φ:L→R, tirer aléatoi-rement des motifs qui satisfont les contraintes de C avec une probabilité proportionnelle à leur qualité.

Profil du candidat :
Niveau master 2 (ou équivalent) en informatique (ou mathématiques appliquées) ayantun intérêt pour l’intelligence artificielle, la programmation par contraintes, et la fouille de données.

Formation et compétences requises :
Des compétences en programmation JAVA, Python et C++ ainsi qu’une bonne compréhension des algorithmes de fouille de données et de résolution de contraintes et SAT seront appréciées. La langue utilisée est le français ou l’anglais.

Adresse d’emploi :
Laboratoire GREYC, CNRS UMR 6072, Université de Caen, 14000, Caen. Avec des interactions régulières avec l’équipe Contraintes et Apprentissage au laboratoireLIFO, EA 4022 – Université d’Orléans.

Document attaché : 202410241252_FIDD___Sujet_de_Stage_M2-1.pdf

IA Équitable, Apprentissage Décentralisé Profond sur des Images à Haute Dimension

Offre en lien avec l’Action/le Réseau : – — –/– — –

Laboratoire/Entreprise : Université dArtois/Centre de recherche en informat
Durée : 18 mois
Contact : wissem.inoubli@univ-artois.fr
Date limite de publication : 2024-12-23

Contexte :
Le Centre de recherche en informatique de Lens (CRIL) le l’équipe FOX de Centre de Recherche en Informatique signal et Automatique de Lille (CRIStAL) recrutent un post-doctorant pour une durée de 18 mois dans le domaine de l’apprentissage profond,
de préférence avec une expertise en apprentissage sur graphes.

Sujet :
Le contexte scientifique se concentre sur les Graph Neural Networks (GNN), un type de réseau de neurones qui a connu une croissance remarquable dans une large gamme d’applications, y compris la vision par ordinateur, la bioinformatique, la finance, la chimie et bien d’autres. Cette popularité découle de leur capacité à identifier des motifs et des caractéristiques complexes, souvent cachés, au sein des données. Cependant, comme d’autres modèles d’apprentissage profond, les GNN font face à des défis liés à l’optimisation des hyperparamètres. En plus de ces défis généraux, les GNN rencontrent des problèmes spécifiques, tels que le phénomène d’over-smoothing. Ce problème survient lorsque des représentations similaires (ou même identiques) sont attribuées à la plupart, sinon à tous, les noeuds du graphe, compliquant ainsi les tâches de classification ou de régression en limitant la capacité du modèle à différencier ou à généraliser entre les noeuds. Au-delà des défis de modélisation et d’optimisation des GNN, d’autres problèmes majeurs se posent lors de l’entraînement de ces modèles, cette fois liés aux données plutôt qu’au modèle lui-même. Ces défis incluent : (i) La complexité algorithmique lors de l’entraînement des GNN, surtout lors de l’utilisation de grands ensembles de données tels que les images hyperspectrales. (ii) Un deuxième défi, bien qu’abordé dans d’autres types de données complexes tels que les images médicales spatio-temporelles 4D.

Les travaux de recherche pour ce poste postdoctoral seront menés conjointement entre les deux laboratoires. L’objectif est de proposer un modèle équitable (FAIR) et distribué pour traiter la complexité algorithmique dans l’apprentissage tout en s’attaquant aux défis sous-jacents tels que la communication, l’équilibrage de charge dans le traitement distribué et la partition des données. Les activités de recherche postdoctorale se dérouleront en quatre phases :
1. Revue de la littérature sur les systèmes d’apprentissage décentralisés et les
méthodes/algorithmes pour l’apprentissage sur graphes.
2. Proposition d’un modèle/architecture distribué pour l’apprentissage sur graphes.
3. Amélioration du modèle proposé pour éviter le biais d’apprentissage.
4. Validation des deux modèles dans des cas d’utilisation : classification d’images à haute
dimension et/ou segmentation sémantique d’images à haute dimension (HSI, IRM, CT, etc.)

Profil du candidat :
La personne recrutée sera titulaire d’un doctorat en informatique (spécialité Intelligence Artificielle et apprentissage profond de préférence), avec des connaissances en systèmes distribués.
Une forte capacité d’interaction : compréhension des besoins et argumentation des choix (à l’oral et à l’écrit) est requise. Des compétences en développement (pytorch, pytorch geometric, etc) sont nécessaires.

Formation et compétences requises :

Adresse d’emploi :
Centre de recherche en informatique de Lens

Deep Learning architectures for generating rehabilitation human motion

Offre en lien avec l’Action/le Réseau : – — –/– — –

Laboratoire/Entreprise : IRIMAS, Université Haute-Alsace
Durée : 6 months
Contact : maxime.devanne@uha.fr
Date limite de publication : 2024-12-23

Contexte :
Human motion analysis is crucial for studying people and understanding how they behave, communicate and interact with real world environments. Due to the complex nature of body movements as well as the high cost of motion capture systems, acquisition of human motion is not straightforward and thus constraints data production. Hopefully, recent approaches estimating human poses from videos offer new opportunities to analyze skeleton-based human motion. While skeleton-based human motion analysis has been extensively studied for behavior understanding like action recognition, some efforts are yet to be done for the task of human motion generation. Particularly, the automatic generation of motion sequences is beneficial for rapidly increasing the amount of data and improving Deep Learning-based analysis algorithms. In particular, this is crucial in a medical context like in physical rehabilitation where acquiring data is challenging. Rehabilitation human motions are corresponding to reha- bilitation exercises proposed by physiotherapists. Unlike classification tasks, the targeted task in human rehabilitation assessment is often a regression problem, where given a motion sequence, the goal is to predict the associated performance score given by physiotherapists.
Since several years, human motion generation paradigms have been possible thanks to the appearance of Generative Adversarial Networks (GAN), Vari- ational AutoEncoder (VAE) or Diffusion models. While most of these works have considered motion capture (mocap) data, we consider noisy skeleton data estimated from videos as it is easily applicable in real-world scenarios for the general public.

Sujet :
The goal of this internship is to investigate deep generative models for skeleton- based human motion sequences with a particular focus on rehabilitation data. Inspiring from recent effective Deep Learning-based approaches, the aim is to generate full skeleton-based rehabilitation motion sequences. It is therefore crucial to investigate how deep generative models can handle such noisy and possibly incomplete data in order to generate novel rehabilitation motion sequences as natural and variable as possible.
In particular, the candidate will work on the following tasks:
– Deep generative models adapted to rehabilitation data: based on studies from existing works, the goal is to build generative models for rehabilitation sequences. Therefore, the candidate will investigate different generative models, like Diffusion models, in order to propose and develop a complete Deep Learning model for generating skeleton-based human motions. These models will be trained using publicly available datasets such as the Kimore dataset.
– Evaluation of deep generative models: in order to validate the proposed model, experimental evaluation is crucial. In comparison to motion recognition where classification accuracy is a natural way to assess an approach, evaluating the task of motion generation is not as straightforward. Dedicated metrics evaluating both naturalness and diversity of generated sequences as well as the impact of new generated sequences in a classifi- cation task will be considered.
– Text to rehabilitation motion: The generated models will be then adapted to take as input text sequences corresponding to rehabilitation exercises’ descriptions. This will be particularly useful to create new rehabilitation exercises.

Profil du candidat :
The candidate must fit the following requirements:
– Registered in Master 2 or last year of Engineering School (or equivalent) in Computer Science
– Advanced skills in Python programming are mandatory
– Good skills in Machine Learning & Deep Learning using related libraries (scikit-learn, Tensorflow, Pytorch, etc.) are required
– Knowledge and/or a first experience in human motion analysis will be appreciated
– Knowledge and/or a first experience in Natural Language Processing to handle text-to-motion generation

Formation et compétences requises :
The candidate must fit the following requirements:
– Registered in Master 2 or last year of Engineering School (or equivalent) in Computer Science
– Advanced skills in Python programming are mandatory
– Good skills in Machine Learning & Deep Learning using related libraries (scikit-learn, Tensorflow, Pytorch, etc.) are required
– Knowledge and/or a first experience in human motion analysis will be appreciated
– Knowledge and/or a first experience in Natural Language Processing to handle text-to-motion generation

Adresse d’emploi :
Université Haute-Alsace
12 rue des Frères Lumière
68093 Mulhouse

Document attaché : 202410230753_internship_position_delegation_generation_2025.pdf

Detection of wild animals in zoo enclosure using thermal cameras and deep learning

Offre en lien avec l’Action/le Réseau : – — –/– — –

Laboratoire/Entreprise : IRIMAS, Université Haute-Alsace
Durée : 6 mois
Contact : maxime.devanne@uha.fr
Date limite de publication : 2024-12-23

Contexte :
Nowadays, zoo enclosures are becoming closer to natural biotopes of wildlife animals. This implies large enclosures with biological elements such as plants and trees, and landscape elements such as rocks, hills and so on. If these new ways of designing enclosures are really improving the wellness of the hosted animals, however these ones can become hardly visible. This implies two problems :
– Frustration of visitors who want to see animals
– Difficulties for the zookeeper staff to observe the animal
Particularly, this last issue can cause a) difficulties to observe an abnormal behavior of an animal, which can delay veterinary heals if necessary and b) accident if the zookeeper has to enter into an enclosure without a clear view of the animal. To cope those problems, cameras can be installed around or inside the enclosures to monitor the animals in real-time. Particularly, thermal cameras have been proved to be very efficient in enclosures with large number of plants or even during night-time. The goal of this internship is to use multi-camera setup and data fusion to detect animals using deep learning techniques such as CNNs or YOLO.

Sujet :
The intern will have to first review the existing literature based on articles and surveys about zoo animal monitoring. Then, the goal is to select and purchase cameras (RGB, thermal, other modalities) according to the state-of-the-art, and to settle them with the help of the staff of the Mulhouse Zoo. In parallel, finding in the literature neural networks such as YOLO able to create a bounding-box prediction of the position of the animal in an image. The training of the neural network can be done using databases such as DeepFaune. Finally, data fusion can be explored to enhance the performance of the neural networks by coupling RGB and thermal predictions. GPU-based architectures will be used with Python programming.

Profil du candidat :
Final-year student in Master 2 / Engineering school (BAC+5), with an Artificial
Intelligence / Computer Vision background. Good programming skills are expected (C, C++, Python). A
first experience with camera acquisition, particularly thermal images, is good.

Formation et compétences requises :
Final-year student in Master 2 / Engineering school (BAC+5), with an Artificial
Intelligence / Computer Vision background. Good programming skills are expected (C, C++, Python). A
first experience with camera acquisition, particularly thermal images, is good.

Adresse d’emploi :
Université Haute-Alsace
12 rue des Frères Lumière
68093 Mulhouse

Document attaché : 202410230749_Master_internship_zooAI_2025.pdf

MaDICS

Masses de Données, Informations et Connaissances en Sciences

Big Data - Data Science

Archives

Offre de stage/thèse GENAI for causality

Aide à la correction d’anomalies dans des données multidimensionnelles et multirelationnelles sur l’agroécologie en santé animale et végétale

Offre de stage M2 – Projet STAY (LISIS- TETIS ) à Montpellier

Post-doctoral position – Modeling high-contrast intensity observations: from data-driven calibration to the integration of physical priors

Representation of physical quantities on the Semantic Web

L’institut ACSS de l’Université PSL recrute 1 ingénieur de recherche en science des données pour les sciences sociales

Échantillonnage de motifs sur des donnéeshétérogènes

IA Équitable, Apprentissage Décentralisé Profond sur des Images à Haute Dimension

Deep Learning architectures for generating rehabilitation human motion

Detection of wild animals in zoo enclosure using thermal cameras and deep learning