Postdoc (2 years) at CEA: NLP and genomics

When:
03/02/2023 – 04/02/2023 all-day
2023-02-03T01:00:00+01:00
2023-02-04T01:00:00+01:00

Offre en lien avec l’Action/le Réseau : DOING/– — –

Laboratoire/Entreprise : CEA List
Durée : 2 years
Contact : deepgenseq@saxifrage.saclay.cea.fr
Date limite de publication : 2023-02-03

Contexte :
The french CEA (Commissariat à l’Energie Atomique et aux Energies Alternatives) is looking for a Postdoctoral Fellow to join its laboratory of semantic analysis of texts and images.

In this exciting project, you will integrate an interdisciplinary team aiming to move closer to the goal of predictive and generative artificial intelligence for biology by exploiting deep contextual language models of biological sequences, whose representations generalize to several applications like the prediction of mutational effects.

TERMS & COMPENSATION

This 2 years position is open to a range of candidates from recent college graduates to more experienced scientists (e.g. post-docs) – the chosen candidate’s salary will be commensurate with their level of education, skills, and experience. Other benefits include:
– 48 days of paid holidays
– on-site subsidized restaurant
– partial remote work is possible, up to 3 days per week and 100 days per year
– CEA contribution to the personal company savings plan

Sujet :
BACKGROUND

Exponential growth in sequencing throughput together with the sampling of natural (uncultured) populations are providing a deeper view of the diversity of proteins sequences across the tree of life. Proteins are molecular engines sustaining cellular life and the unobserved determinants of their structure and function are encoded in the distribution of observed natural sequences. Therefore, such vast amounts of (unlabelled) sequences provide evolutionary data that can form the ground for unsupervised learning of predictive and generative models of biological function.

Our focus here will be to train high-capacity Transformer-based language models on sequence data, in a way analogous to what is done in natural language understanding, where the semantics of words is determined from the contexts in which they appear in sentences. Intrinsic organizing principles captured in the resulting representations can then be applied in transfer learning settings to different prediction sub-tasks using limited experimental data, like the effect of sequence variation on function. Following promising recent results, we plan to also explore zero-shot inference with no additional training and/or supervision from experimental data.

This project will be an excellent opportunity for a candidate who is looking to contribute to cutting-edge research and to train with experts in the field. We are seeking a detail-oriented computer scientist and problem solver passionate in science.

RESPONSIBILITIES

* Tune and optimize existing unsupervised transformer-based language models for protein sequences.
* Develop and optimize code and machine learning algorithms for predictive models.
* Integrate and analyze large data volumes.
* Interact continuously with scientists in an interdisciplinary team.

Profil du candidat :
* Ph.D. or M.Sc. in a quantitative discipline, e.g. Applied Mathematics, Computer Science, Computational Biology, Physics or a closely related discipline.
* Experience with Python, open-source software libraries for machine learning and Linux (file systems, shell, hardware/software monitoring, etc).
* Strong mathematical background and analytical skills.
* Effective organizational skills, e.g. the ability to prioritize work and contribute to the planning of a program of scientific research.
* Demonstrated interpersonal skills including both the ability to work independently and perform collaborative research in an interdisciplinary team environment.
* Good oral and written communication skills.

Preferred: Previous experience with transformer-based techniques for NLP pre-training and unsupervised transformer language models

Formation et compétences requises :
* Ph.D. or M.Sc. in a quantitative discipline, e.g. Applied Mathematics, Computer Science, Computational Biology, Physics or a closely related discipline.

Adresse d’emploi :
LOCATION

We are based on the Paris-Saclay research campus in the south of Paris.

HOW TO APPLY

Interested candidates should submit a resume and short cover letter to deepgenseq@saxifrage.saclay.cea.fr

ABOUT US :

About CEA LIST: https://list.cea.fr/en/

About the LASTI lab: https://kalisteo.cea.fr/index.php/ai/

Semantic Analysis of Text and Images

About Genoscope: https://www.genoscope.cns.fr