RESUMES : peRsonal knowlEdge baSe constrUction froM hEterogeneous Sources

10/01/2022 – 11/01/2022 all-day

Offre en lien avec l’Action/le Réseau : – — –/Doctorants

Laboratoire/Entreprise : Télécom SudParis, Laboratoire SAMOVAR, Carian Soft
Durée : 3 ans
Contact :
Date limite de publication : 2022-01-10

Contexte :
This thesis is a CIFRE and a collaboration between Telecom SudParis and Carian Software Development. The position will start before October 2022.

Sujet :
RESUMES : peRsonal knowlEdge baSe constrUction froM hEterogeneous Sources

The Web is composed of many documents of different nature, such as texts, images, or videos. These documents contain information about a wide range of topics that are noisy, unstructured, and ambiguous. Therefore, exploiting this variety is a huge challenge. When it comes to information about humans, one could use specialized websites such as social media, forums, blogs, or personal websites. However, it raises many problems. For example: How can we, from a single source, extract knowledge about a person? How can we know that two accounts on two different websites represent a single person? How does a person communicate with others?

This kind of information can be valuable in many applications, and in particular for CV enrichment. Given a candidate’s resume, we would like to complement it with external sources such as Linkedin, Reddit, or GitHub. These additional clues can help a recruiter to make the appropriate decisions.

This thesis aims to construct a Personal Knowledge Base (PKB) from information gathered online to complement a resume. A personal knowledge base is a collection of structured statements about a person that can be queried and on which one can reason.

For example, let’s say we have a candidate called John. He has a GitHub page that we managed to link to his resume. We extracted statements such as “John, knows, Java” and “John, contributes to, Open Source projects” from his profile. These statements are now part of his PKB. Now, we find a StackOverflow account for the same username. This account answered many questions about Java. We might suppose that the two accounts belong to the same person, and therefore we can complete John’s PKB. Suppose we know that this John is a potential candidate for a company working on open source projects written in Java. In that case, we can boost his resume and present additional information to help the recruiter.

Profil du candidat :
See below.

Formation et compétences requises :
For this thesis, we will consider candidates with a master or engineer diploma with knowledge about several of the following skills:
* Fluent written and spoken English. Some knowledge of French can be useful.
* Machine/Deep Learning
* Natural Language Processing
* Very good level in a programming language like Python and experience in software development
* Information extraction
* Knowledge bases/Ontologies
* Logic and automated reasoning
* Semantic Web and Web crawling
* Experience in a research laboratory

Adresse d’emploi :
Telecom SudParis, 9 Rue Charles Fourier, 91000 Evry-Courcouronnes FRANCE
Telecom SudParis, 19 place Marguerite Perey, 91120 Palaiseau, France

Document attaché : 202111041617_SujetTheseCIFRE.pdf