Offre en lien avec l’Action/le Réseau : – — –/Doctorants
Laboratoire/Entreprise : Sorbonne Universite – Laboratoire d’Informatique
Durée : 36 mois
Contact : email@example.com
Date limite de publication : 2021-11-21
Full description: https://mlia.lip6.fr/wp-content/uploads/2021/09/PhD-proposal-Deep-Learning-for-Data-to-Text-and-Text-to-Data-Generation-1.pdf
Knowledge sources are often encoded into structured format such as indexes, tables, triplets, ontologies, knowledge bases, or even raw numerical data. These data are easily readable by machines, but hardly interpretable by humans. On the opposite, textual information, easily accessible to humans is often complex to exploit by machines. A key challenge and an emerging field in machine learning and natural language processing, is the transcription of structured data to text and the inverse problem of transforming raw text into structured data. The former problem is called data-to text generation and it occurs in several applications like journalism, medical diagnosis, financial reports. It may be a component of explainable AI systems. The latter problem is known as semantic parsing and comes in different instantiations like information extraction, reasoning over the structured data (table or graph), generating symbolic queries.
The research will explore new paradigms for the dual tasks of data to text and text to data generation such as:
• Learning from unaligned corpora
Most current methods require learning from parallel corpora, where data and text are fully aligned and correspond closely one to the other. A first line of research will be the development of new unsupervised frameworks allowing training from unaligned data-text corpora.
• Learning from diverse sources
Current benchmarks focus on learning mappings from a unique structured data format to text. In practice data will be collected from different sources encoded through a diversity of formats. A second direction will explore new formalisms for learning such multiple correspondences.
• Controlled text and data generation
Current research mainly focuses on the cases where there is a bijective correspondence between the data and text. A more general task is to summarize information along different aspects of the data. We will explore how to control generation according to different aspects and user needs.
Profil du candidat :
Master in computer science or applied mathematics, Engineering school. Strong background and experience in machine learning and/or natural language processing , and good technical skills in programming.
Formation et compétences requises :
Machine learning and Deep Learning
Experience or interest for Natural Language processing
Strong computer programming skills
Adresse d’emploi :
Sorbonne Université, Pierre et Marie Curie Campus, 4 Place Jussieu, Paris, Fr