PhD thesis position – ARTEXT4LOD – n-ARy relaTions EXTraction for Linked Open Data

When:
15/06/2018 – 16/06/2018 all-day
2018-06-15T02:00:00+02:00
2018-06-16T02:00:00+02:00

Annonce en lien avec l’Action/le Réseau : aucun

Laboratoire/Entreprise : Link, IATE, TETIS
Durée : 36 months
Contact : dibie@agroparistech.fr, mathieu.roche@cirad.fr, patrice.buche@inra.fr
Date limite de publication : 2018/06/15

Contexte :
A 3-years PhD Student position is available in the ARTEXT4LOD project, a MUSE project (Univ. Montpellier) involving INRA IATE JRU, AgroParisTech & INRA MIA JRU and CIRAD TETIS JRU.

Benefits
Monthly Gross Salary: 1 800 euros / Duration: 36 months

Sujet :
The goal of this PhD Thesis is to enrich a knowledge base with n-ary relations extracted from textual scientific documents. The aim of the ARTEXT4LOD PhD project is to ease the extraction of experimental data from scientific documents available on-line, experimental data being represented as n-ary relations where a studied object is modeled as a symbolic argument and its features as quantitative arguments associated with their attributes, i.e. the numerical value and measurement unit.
The PhD fellow will have to explore two main research directions:
– To exploit meta-data extracted from textual scientific documents to drive the n-ary relations identification and extraction such as figures, captions, tables, captions or structural information (e.g. abstract, summary). This first task is very difficult as the textual features are non-necessarily normalized (e.g. the units of measurement). It will consist in extending and improving the first results of the PhD work of Soumia Lilia Berrahou (Berrahou et al. 2017).
– To take into account expert knowledge, that could be learnt all along the n-ary relations identification and extraction. This allows the iterative and incremental localization, identification, extraction and annotation of n-ary relations using expert knowledge. This could be done using original methods based on active learning approaches (Silva et Silva 2007 ; Martinez Alonso et al. 2015) and Relevant Feedback (Harashima et Kurohash, 2011 ; Valcarce et al. 2018) used for Information Retrieval tasks.

References
– L. Berrahou, P. Buche, J. Dibie-Barthelemy, M. Roche (2017) Xart: Discovery of correlated arguments of n-ary relations in text. Expert Syst. Appl. 80: 244-262.
– C. Silva, B. Silva. On text-based mining with active learning and background knowledge using SVM. Soft Computing, Volume 11 Issue 6, Pages 519 – 530, 2007
– H. Martinez Alonso, B. Plank, A. Johannsen, A. Søgaard. Active. Learning for sense annotation. Proceedings of the 20th Nordic Conference of Computational Linguistics, 245-249, 2015
– J. Harashima, S. Kurohash. Relevance Feedback using Latent Information. Proceedings of the 5th International Joint Conference on Natural Language Processing, pages 1037-1045, 2011
– D. Valcarce, J. Parapar, Á. Barreiro. 2018. LiMe: Linear Methods for Pseudo-Relevance Feedback. In SAC 2018: SAC Symposium on Applied Computing , April 9-13, 2018, Pau, France. ACM, 2018

Profil du candidat :
The successful candidate should hold a recent master degree in Computer Science or equivalent degree.

Formation et compétences requises :
Strong background knowledge in Ontology and Data mining; Programming skills are absolutely necessary, and software engineering experience is welcome.

Adresse d’emploi :
Contacts:
Patrice Buche patrice.buche@inra.fr
Juliette Dibie dibie@agroparistech.fr
Mathieu Roche mathieu.roche@cirad.fr

Please send a complete CV, a motivation letter with a statement of research interest, a copy of your Master degree and a list of at least 2 references (names and contact information).

Document attaché :