Post-Doc at BABBAR.TECH: Web Page Segmentation

Offre en lien avec l’Action/le Réseau : – — –/– — –

Laboratoire/Entreprise : GREYC / Babbar.tech
Durée : 12-18 months
Contact : recrutement@babbar.tech
Date limite de publication : 2024-04-30

Contexte :
The GREYC lab performs research works in the field of digital science with activities in image processing, machine learning, artificial intelligence, computer security, fundamental computer science, Web science, electronics.

Babbar is specialized in web data collection and provides a large scale view on the web graph to its users. Babbar crawls more than 1.5B pages per day, and its index currently contains information about more than 1500B urls.

Sujet :
The postdoctoral scholar will be working on Web Page Segmentation with a primary goal to detect the different zones of a web page, select interesting areas and extract meaningful content. This interdisciplinary research project combines structural analysis, natural language processing and machine learning techniques to develop advanced algorithms capable of segmenting web pages into meaningful and semantically distinct regions.

Profil du candidat :
– Hold a recent Ph.D. degree in Computer Science, Electrical Engineering, or a related field.

Formation et compétences requises :
– Demonstrate a strong research background in natural language processing or machine learning.
– Possess a track record of publications in top-tier conferences/journals related to machine learning, NLP, or related areas.
– Strong programming skills.
– Excellent written and verbal communication in English and interpersonal skills.

Adresse d’emploi :
Caen, France

Document attaché : 202401091508_Post-doc BABBAR.pdf

Postdoctoral position – G-GENOCOD (Graph-GEneration for NOvel COmpound Discovery)

Offre en lien avec l’Action/le Réseau : – — –/Doctorants

Laboratoire/Entreprise : LERIA (Université d’Angers)
Durée : 18 months
Contact : nicolas.gutowski@univ-angers.fr
Date limite de publication : 2024-03-31

Contexte :
In chemistry, the discovery of new molecules often results from the refinement of an already known effective compound through chemical reactions to enhance its properties. The emergence of a truly new molecule is a rarer phenomenon. It is around this objective that a theme has developed focusing on the de novo generation of molecules with desired properties. Among the challenges in this research area are the size of the search space and the difficulty of generating synthesizable molecules.
Molecules can be represented as graphs, where vertices are labeled according to the type of atom, and edges are labeled according to the type of bond. This is a problem of generating a graph structure, where the goal is the combination of one or more functions to optimize and constraints to satisfy. Thus, the G-GENOCOD project (Graph-Generation for Novel Compound Discovery), although applied to chemistry, addresses a much broader problem of generating complex graph structures with a very large space of composed actions.

Sujet :
The G-GENOCOD project follows EvoMol, an evolutionary algorithm for molecule generation developed by an interdisciplinary team from LERIA and MOLTECH. While EvoMol achieves benchmark results, significant challenges that G-GENOCOD aims to address remain:
1. The first will be conditioned by the goal of realism (generating synthesizable molecules). This objective is crucial for real-world applications. Methods derived from the goal-conditioned RL approach will enable the attainment of strong synthesizability properties while being explainable.
2. The second will be conditioned by the selection of optimal actions to achieve the desired chemical properties (“properties conditioned”), i.e., the choice of actions on the graph is currently random. One would expect an intelligent method (here, reinforcement learning: RL) to apply a policy of selecting actions that has worked in the past, as a chemist would add a known chemical function to enhance a target property.
3. The third objective will finally be to evolve towards the ability to generate molecules according to several defined objectives (multi-objective optimization).

Profil du candidat :
Knowledge :
– Reinforcement learning
– Graph theory
– Theorems and proofs of convergence
– Probabilistic reasoning

Know-how:
– Python development
– Development using scikit-learn, PyTorch
– Writing scientific articles in LaTeX

Soft skills:
– Efficient and responsive
– Autonomous
– Proactive
– Rigorous

Formation et compétences requises :
PHD degree of less than 3 years
Specialty : Computer Science, Artificial Intelligence

Adresse d’emploi :
UFR Sciences, 2 Bd de Lavoisier, 49000 Angers, FRANCE

Document attaché : 202401082139_FP_POST-DOC_G-GENOCOD-ENG.pdf

Professeur(e) en apprentissage automatique, INSA Lyon/LIRIS

Offre en lien avec l’Action/le Réseau : – — –/– — –

Laboratoire/Entreprise : LIRIS, INSA Lyon
Durée : CDI
Contact : remy.cazabet@univ-lyon1.fr
Date limite de publication : 2024-03-31

Contexte :
L’INSA Lyon recrute un/une professeur(e) en apprentissage automatique rattaché(e) au LIRIS.
L’équipe DM2L (Data Mining & Machine Learning) a la priorité du laboratoire.
L’annonce est détaillée ci-dessous.

Sujet :
Le LIRIS (UMR CNRS 5205) cherche à renforcer son impact sur l’apprentissage automatique, thématique centrale de l’enjeu « Information et Société Numérique » de l’INSA Lyon, dont l’intelligence artificielle est un pilier. La personne recrutée devra démontrer ses capacités à mener une recherche de haut niveau portant sur les aspects fondamentaux des méthodes et des algorithmes d’apprentissage automatique à partir de données complexes (structurées ou non structurées comme les textes, les images,ou les graphes). Le développement de synergies entre l’apprentissage neuronal, flexible et performant, et l’apprentissage symbolique, robuste et permettant l’interaction humaine, sera clairement apprécié, notamment pour l’IA explicable et frugale. Elle devra aussi avoir mis en œuvre et étudié les méthodes d’apprentissage de modèles et démontré sa capacité à développer des méthodes innovantes pour l’analyse et l’interprétation de données dans des cadres applicatifs variés.

Profil du candidat :
La personne recrutée devra avoir démontrée sa capacité à développer sa recherche au travers de collaborations nationales et internationales via des projets structurants avec le monde académique et/ou industriel. En termes de prises de responsabilités, elle pourra prendre en charge l’animation des activités de recherche du LIRIS sur l’apprentissage automatique. Son projet de recherche s’intégrera dans l’équipe DM2L (priorité du LIRIS) ou l’équipe IMAGINE.

Formation et compétences requises :
N’hésitez pas à prendre contact pour plus d’information.
Équipe DM2L:
Céline Robardet (celine.robardet@insa-lyon.fr)
Rémy Cazabet(remy.cazabet@univ-lyon1.fr)

Adresse d’emploi :
INSA Lyon/LIRIS

Réunion ARA

Réunion ComDir

Two postdoctoral fellows – AI for breast cancer screening

Offre en lien avec l’Action/le Réseau : DOING/Doctorants

Laboratoire/Entreprise : ETS Montréal, CentraleSupelec
Durée : 18 months
Contact : pablo.piantanida@mila.quebec
Date limite de publication : 2024-02-29

Contexte :
We are excited to share an interesting opportunity for two postdoctoral fellows, each with an 18-month tenure, to actively contribute to groundbreaking research in the field of AI for breast cancer screening.

Sujet :
Our project, funded by FRQS and Health Data Hub, titled “AI Foundation Models for Breast Cancer Screening: Advancing Early Detection through AI,” is calling for skilled individuals to become part of our international team between the International Laboratory on Learning Systems (ILLS) together with the Quebec AI Institute, located in Montreal (QC, Canada), and MICS located in CentraleSupelec within Paris-Saclay University (France). This role offers a key position in shaping the development and progress of AI-driven solutions for early breast cancer detection.

For further details check: https://sites.google.com/mila.quebec/pablo-piantanida/openings?authuser=0#h.kyzvdsd2q45m

Profil du candidat :
= Position Qualifications =
+ PhD program in Computer Science, Machine Learning, Computer Engineering, Mathematics, or related field (e.g. applied mathematics/statistics).
+ Very good understanding of Machine Learning theory and techniques, as well as of computer vision.
+ Strong publication track in recognized venues of computer vision (CVPR, ECCV, ICCV), machine learning (NeurIPS, ICLR, ICML) and/or medical image computing (MedIA, IEEE TMI, MICCAI).
+ Good programming skills in Python (PyTorch).
+ Applications/ domain-knowledge in medical image processing is a plus.
+ Good communication skills in written and spoken English.
+ Creativity and ability to formulate problems and solve them independently.

Formation et compétences requises :
= How to apply =
If you are interested, please send us the following elements as soon as possible
and not later than January 20th:
+ Detailed CV.
+ Letter of motivation.
+ Elements of bibliography or personal achievements related to a research activity.
+ 2 references or recommendation letters.

If you are interested and meet the qualifications, please submit your application letter and CV by email.

Adresse d’emploi :
ETS Montreal (1100 Notre-Dame St W, Montreal, Quebec H3C 1K3) and CentraleSupelec (3 Rue Joliot Curie, 91190 Gif-sur-Yvette)

Document attaché : 202401040542_Postdoc Fellowships.pdf

Challenges of Mixed Data Clustering

Offre en lien avec l’Action/le Réseau : SimpleText/– — –

Laboratoire/Entreprise : DVRC
Durée : 4 mois
Contact : sonia.djebali@devinci.fr
Date limite de publication : 2024-02-29

Contexte :
Industrial context

The energy sector is in the midst of significant transformation, prompted by the need to increase the use of renewable energy sources and improve energy efficiency, becoming a Smart Grid. This cutting-edge technology allows for the analysis, management, and coordination of energy production, consumption, and distribution, all with the goal of promoting more sustainable practices. A challenge arises from the fact that the data is mixed, containing both numerical and categorical information, often in the form of a data stream. Analyzing this kind of data requires adapted methods. As a result, traditional methods that are designed for numerical data are not well-suited to this type of data.
Advanced tools for analyzing complex systems that can handle rich and heterogeneous data are crucial for Trusted Third Parties for Energy Measurement and Performance to provide independent energy performance analysis and recommendations for clients. It is important that these tools are also easily interpretable by energy experts to facilitate classification and recommendation.
Creating clusters of similar buildings is an effective way to handle complex energy data. Hierarchical clustering of mixed data is a crucial approach that allows energy experts to easily associate clusters with recommendations. It is an essential tool for not only the energy sector but also has diverse applications in fields such as biology, medicine, marketing, and economics.

Sujet :
Scientific context

Although mixed data is widespread, clustering tools specifically designed for it are limited. Some of the bottlenecks have already been defined in a previous scientific paper. Here is a non-exhaustive list of bottlenecks one can encounter when handling mixed data in a pipeline:

Data preprocessing: Data preprocessing is a critical step in mixed data clustering like handling missing data, encoding categorical data, and scaling numerical data.
Feature selection: Mixed data clustering requires feature selection to be performed before clustering. However, selecting relevant features can be a challenging and time-consuming task.
Metric selection: Choosing the right distance metric to measure the similarity between different data types.
Evaluation: There is a lack of standard evaluation criteria for mixed data clustering, which makes it hard to compare different methods.
Computational complexity: Mixed data clustering involves dealing with different types of data and distance metrics, which can result in high computational complexity.
Visualization: It is difficult to create visualizations that effectively communicate the relationships between different data types.
Interpretation: Understanding the relationships between different data types can be challenging, especially if the clusters are not well-separated or the data are altered before using any methods.

Profil du candidat :
Etudiant(e) de niveau M1 ou M2 en informatique (Master ou école d’ingénieurs).

Formation et compétences requises :
Connaissance en Machine Learning, Clustring, Python et expérience dans l’utilisation de bibliothèques de ML,
Forte appétence pour la recherche académique
Capacité à effectuer des recherches bibliographiques
Rigueur, synthèse, autonomie, capacité à travailler en équipe

Adresse d’emploi :
Pole Léonard de Vinci
92 916 Paris La Défense Cedex

Document attaché : 202312221037_2024_Stage_MixedData.pdf

Stage M2 : Deep learning faiblement supervisé pour l’aide au diagnostic du lymphome

Offre en lien avec l’Action/le Réseau : – — –/– — –

Laboratoire/Entreprise : GREYC UMR CNRS 6072
Durée : 5/6 mois
Contact : olivier.lezoray@unicaen.fr
Date limite de publication : 2024-03-31

Contexte :
Le lymphome est une pathologie cancéreuse caractérisée par la prolifération de cellules du système lymphatique. Il s’agit en réalité plus « des lymphomes » que « du lymphome » de par la variété des différentes cellules qui peuvent proliférer et des variétés de la gravité des différents lymphomes. Le diagnostic se fait en anatomie et cytologie pathologiques à partir de prélèvements dans le ganglion. A partir de ces prélèvements, on réalise – entre autres – une lame de cytologie par apposition : les cellules du ganglion sont déposées sur une lame de verre et colorées pour en observer les caractéristiques. Or, en dehors de certaines situations évidentes, les cellules de la plupart des différents types de lymphomes ont des caractéristiques morphologiques difficiles à discriminer avec l’œil humain. Il serait donc intéressant d’entrainer un programme d’intelligence artificielle à identifier (ou non) ces caractéristiques à partir d’une banque de lames de cytologie d’empreintes ganglionnaires lymphatiques, dont le diagnostic de lymphome (et de son type) est établi.

Sujet :
La pathologie computationnelle est un domaine en plein essor qui s’avère très prometteur pour amé- liorer l’accès aux soins de santé. En particulier, l’aide au diagnostic a fortement évolué ces dernières années avec l’utilisation d’approches de deep learning. Si ces méthodes permettent d’extraire des caractéristiques plus discriminantes à des fin de diagnostic, elles sont très couteuses en volume de données nécessaire. En effet, cela demande aux pathologistes de réaliser des annotations au niveau pixel de scans de grande taille de lames (au niveau du gigapixel) afin que les modèles puissent effectuer leur apprentissage supervisé. Pour surmonter cette limitation en pathologie digitale, des approches faiblement supervisées sont apparues. Cette fois la lame scannée reçoit une unique annotation avec des caractéristiques provenant des tuiles de la lame scannée. Pour l’apprentissage, les tuiles peuvent toutes hériter du label de la lame ou bien des sacs de tuiles héritent du label (apprentissage à instances multiples). La prédiction médicale se fait alors au niveau de la lame entière : une lame est positive si elle contient au moins une tuile tumorale. Ces approches sont très prometteuses [1] et nous souhaitons les explorer pour l’aide au diagnostic de lames d’empreintes ganglionnaires lymphatiques dans un contexte de suspicion de lymphome.

Profil du candidat :
— Etudiant.e en Master 2 Recherche ou en dernière année d’école d’ingénieur, spécialisé en informatique, image et/ou intelligence artificielle.

Formation et compétences requises :
— Une formation en machine et deep learning est indispensable.
— Des connaissances et expériences en apprentissage profond et programmation (Python, Tensor-
Flow/PyTorch) sont nécessaires.
— Autonomie et curiosité pour la recherche scientifique.

Adresse d’emploi :
Laboratoires : Laboratoire GREYC (UMR CNRS 6072), CHU de Normandie
Encadrants : Marie-Laure Quintyn-Ranty (Praticien Hospitalier CHU Caen Normandie), Olivier Lézoray (PR UNICAEN), Alexis Lechervy (MC UNICAEN).
Stage : Durée de 5-6 mois, à Caen, au Campus 2, ENSICAEN, Bâtiment F.

Document attaché : 202312220817_sujetMasterCHU2024.pdf

MaDICS

Masses de Données, Informations et Connaissances en Sciences

Big Data - Data Science

Archives

Post-Doc at BABBAR.TECH: Web Page Segmentation

Postdoctoral position – G-GENOCOD (Graph-GEneration for NOvel COmpound Discovery)

Professeur(e) en apprentissage automatique, INSA Lyon/LIRIS

Réunion ARA

Réunion ComDir

Réunion ComDir

Réunion ComDir

Two postdoctoral fellows – AI for breast cancer screening

Challenges of Mixed Data Clustering

Stage M2 : Deep learning faiblement supervisé pour l’aide au diagnostic du lymphome