link prediction in distributed knowledge graphs

Offre en lien avec l’Action/le Réseau : – — –/– — –

Laboratoire/Entreprise : LORIA UMR7503 CNRS-Universtié de Lorraine
Durée : 6 mois
Contact : sabeur.aridhi@loria.fr
Date limite de publication : 2022-03-31

Contexte :
Today, vast and diverse sources of data exist for almost every scientific domain, making their integration and intelligent exploitation challenging. Indeed, complex data require expressive data representation models such as graph representation. The Linked Open Data (LOD) movement along with the FAIR (Findability, Accessibility, Interoperability, Reusability) data principles are intended to facilitate heterogeneous data integration and analyses. In the LOD context, graphs are called knowledge graphs as they encompass domain ontologies for typing objects and describing their relationships. Semantic web languages (RDFS, OWL, SPARQL) have reached an interesting level of maturity on which ambitious machine learning techniques can rely. Interestingly, big data and NoSQL solutions make possible web-scale data analyses. So far, such analyses on dedicated big-data architectures are often limited to MapReduce scenarios on rather simple data models (key-value oriented, homogeneous graphs with only one type of nodes and one type of edges). Graph databases, as one NoSQL approach, allow for rich representation of multi-typed attributed nodes and edges. This better expressivity comes with a cost as graph and program distribution is not an easy task.

The objective of this Master project is to make progress to the state-of-the-art of link prediction problem in knowledge graphs in a distributed setting [1][2][3]. We will mainly focus on link prediction approaches proposed by the CAPSID team to solve biological problems like drug discovery.
The proposed distributed approaches will be evaluated using web-scale knowledge graphs for inferring missing links (data completion). YAGO, DBpedia, and synthetic benchmarks are usable for such evaluation and validation purposes [4].

Sujet :
This Master thesis project aims to develop scalable link prediction methods in large and complex graphs. More specifically, the aims of this project are:

– to design scalable implementations of the studied approaches for distributed architectures. In this context, the use of big graph processing frameworks such as Pregel, Trinity, GraphLab and BLADYG need to be studied [5];
– to define evaluation and validation protocols for the proposed algorithms in the context of web-scale knowledge graphs;

This project will be carried out mainly within the Capsid team at INRIA Nancy which combines expertise in knowledge graphs and distributed graph computing (https://capsid.loria.fr).

Profil du candidat :
Candidates must have a bachelor degree in computer science, mathematics, or one of the physical sciences.

Formation et compétences requises :
Good programming skills in an object-oriented programming language such as JAVA or C++ are essential. Experience of NoSQL solutions (Neo4j, Titan, MongoDB), parallel/distributed programming (Spark, Hadoop, Flink) and graph processing frameworks (Pregel, GraphLab, GraphX) is also desirable but not essential.

Adresse d’emploi :
Laboratoire Lorrain de Recherche en Informatique et ses Applications
LORIA
Campus Scientifique
BP239
54500 Vandoeuvre les Nancy

Special issue Text Complexity and Simplification in Frontiers in Artificial Intelligence / Natural Language Processing

Date : 2022-07-02

Web site: https://www.frontiersin.org/research-topics/34050/text-complexity-and-simplification

Submission Deadlines

02 July 2022 Manuscript

Context
Text complexity assessment is one of the urgent problems of our time. Many modern texts, including classroom books and legislative acts, prove to be too difficult and as such cannot cater to readers’ needs. This also applies to legal, financial, banking documents. Although the first methods of measuring text complexity were suggested over 70 years ago, the problem is far from being solved. The diversity of languages, text types and genres, as well as their audience, are major challenges for researchers. Despite the constant growth in the number of scientific publications, their complex language or the lack of scientific acculturation of users creates a tendency to avoid these sources by favoring commercial or political incentives rather than accuracy and informational value. This difficulty in reading scientific documents also exists when scientists are interested in scientific documents from disciplines other than those in which they are experts. Text simplification aims to reduce these barriers. Text simplification is used in the field of translation (pre-editing), localization and technical writing. Simplified texts are also more accessible to non-native speakers, young readers, people with reading disabilities, or with lower levels of education.

We are looking for contributions in the form of Review, Original Research, Brief Research Report, Perspective, Technology and Code etc. in the following areas, including, but not limited to:

application of state-of-the-art models of neural architectures to text simplification and complexity
understanding which features neural networks extract from texts for text simplification and complexity
compiling corpora annotated with complexity labels for training and testing
model evaluation and validation
description of linguistic features relevant to the assessment of the difficulty of various classes of texts
complex word Identification
evaluating the dependence on subject areas, types and genres of texts
text readability for foreign language learners
complexity of web content
text adaptation
scientific multi-document summarization
visualization as text simplification
identification of difficulties preventing the simplification and summarization of texts
metrics of text difficulty
applications in education, law, etc.

Keywords:
neural networks, text, machine learning, complexity, simplification, readability

Lien direct

Notre site web : www.madics.fr
Suivez-nous sur Tweeter : @GDR_MADICS
Pour vous désabonner de la liste, suivre ce lien.

Appel à participation à la 8ème journée de l’AFIA : PDIA 2022 – IA et Créativité, 07 avril 2022, Paris

Date : 2022-04-07
Lieu : CNAM, Paris 75003

L’Association Française pour l’Intelligence Artificielle (AFIA) organise sa huitième journée PERSPECTIVES ET DEFIS DE l’IA sur le thème de « IA et créativité » le 07 avril 2022, au CNAM, Amphi Georges Friedmann, 2 rue Conté – Paris 75003.

Le programme détaillé de la journée est accessible via le lien : https://afia.asso.fr/pdia-2022/

Pour des raisons logistiques, il est fortement conseillé de s’inscrire avant le 01 avril 2022 via le lien : https://www.linscription.com/pro/activite.php?P1=91152

Cet événement est organisé par : Fayçal HAMDI (CEDRIC, CNAM Paris), Engelbert MEPHU NGUIFO (LIMOS, Université Clermont Auvergne), Davy MONTICOLO (ERPI, Université de Lorraine), Fatiha SAIS (LISN, Université Paris Saclay)

—— —— — ——
Programme

9h-9h15 – Accueil
9h15 – 9h30 : Présentation de l’AFIA et introduction de la journée

Session 1 : Texte, IHM et IA

9h30 – 10h30 : Baptiste Caramiaux, Chercheur CNRS au laboratoire ISIR, Sorbonne Paris Université, membre du HCI Sorbonne group [page-web].
Titre: Repenser l’Interaction avec les Technologies d’Apprentissage

10h30 – 11h30 : Alex Gabriel, Chercheur post-doctoral au laboratoire ERPI, Université de Lorraine [page-web]
Titre : Intelligence artificielle pour assister l’idéation et la conception amont

11h30 – 11h45 : Pause café

11h45 – 12h45 : Anne-Gwenn BOSSER, Maîtresse de Conférences au laboratoire STICC, ENIB, Université de Brest Bretagne Loire, [page-web]
Titre : Machines à écrire: créer des programmes qui créent pour apprendre à se servir de l’IA

12h45 – 14h : Pause déjeuner

Session 2 – Arts et IA

14h – 15h : Jean-Claude Heudin, Chercheur en IA, écrivain et compositeur [page-web].
Titre : Angelia – une Intelligence Artificielle pour la musique électronique.

15h -16h : Jérôme Nika, Chercheur à l’IRCAM, [page-web].
Titre : Musique et “IA” pour “Instruments Artificiels”

16h-16h15 – pause café

16h15 – 17h15 : François Pachet, directeur du Spotify Creator Technology Research Lab, [page-web].
Exposé autour de « La créativité computationnelle lié à la musique »

17h15 – Clôture de la journée – discussion

Lien direct

Notre site web : www.madics.fr
Suivez-nous sur Tweeter : @GDR_MADICS
Pour vous désabonner de la liste, suivre ce lien.

Coverage Measures for Machine Learning Enabled Cyber-Physical Systems

Offre en lien avec l’Action/le Réseau : – — –/– — –

Laboratoire/Entreprise : Université Grenoble Alpes (UGA), Verimag Laborato
Durée : 3 ans
Contact : thao.dang@univ-grenoble-alpes.fr
Date limite de publication : 2022-08-31

Contexte :
The thesis is fully funded for three years by a grant from region Auvergne-Rhône-Alpes starting in 2022.
The description of the topic can be found at https://www.decyphir.com/PhD_Position_DETAI22.html

Cyber Physical Systems (CPS) are systems mixing software and hardware (cyber) components in interaction with their (physical) environment. Typical examples includ autonomous cars, robots, medical devices. Mathematically, they are modelled with so-called hybrid systems, which are dynamical systems with multiple modes, which can be continuous or discrete in nature. Since the modelisation includes the physical/biological environment, the models can be of arbitrary complexity, from trivial (not all models need be complex to be useful) to untractable for nowadays computational resources due to the infiniteness of input and state spaces of these systems. Hence new methods and tools are always needed to manage and handle the type of heterogeneous computations and data generated by the analysis and design of hybrid systems.

Sujet :
In this thesis, we want to tackle this issue from the angle of coverage measures. Given a CPS problem and some data and/or models (e.g., a hybrid system) associated to it, the question is: what is the mathematical domain that can represent all possible data that can be observed, and can we measure how well the given data represent this domain? This question is of primordial theoretical and practical interest in many contexts. One popular contemporaneous instance is that of machine learning (ML). It is well-known that ML-based algorithms, which are more and more used for CPS design, are only as good as the data used to train them. However it is much less well understood how to formally define the “goodness” of the data at our disposal. Hence there is a need for meaningful measures that can be computed and used not only to quantify the quality of a set, but also to fix it by, e.g., shrinking or augmenting it to better represent a domain to learn.
The questions of coverage, sampling, data augmentation, ML, CPS, etc are not new and topics that have attracted a lot of interest recently. The originality of this thesis will be to tackle these problems from the perspective of hybrid systems and formal methods, which are two research directions in which Verimag and Decyphir are specialized into and internationally recognized for. The intrinsic hybrid nature of data and systems considered in machine learning for CPS is often overlooked and we believed there is a need to study it in a more systematic and explicit way. Formal methods makes it possible to derive more rigorous guarantees and the hope is also that through the use of specification languages such as, e.g., Signal Temporal Logics (STL), they can help in the development of “explainable” measures, i.e., measures that are directly related to precisely formulated requirements as opposed to some hard to interpret mean squared error quantity as is the most frequent practice.

Profil du candidat :
We are looking for candidates with a Master degree in computer science or control engineering interested in CPS, artificial intelligence and machine learning. The thesis is expected to feature a strong experimental and development component but opportunities to developping theoretical contributions will also be likely. As a consequence, candidates with both theoretical and practical inclinations are welcome to apply.

Formation et compétences requises :
Master degree in computer science or control engineering

Adresse d’emploi :
Verimag Laboratory, Université Grenoble Alpes (UGA),
700 avenue centrale
38400 Saint Martin D’Hères

L’Epita recrute des Enseignants-Chercheurs

Offre en lien avec l’Action/le Réseau : – — –/– — –

Laboratoire/Entreprise : Laboratoires de recherche de l’EPITA (LRDE/LSE)
Durée : CDI
Contact : thierry.geraud@epita.fr
Date limite de publication : 2022-04-15

Contexte :
L’EPITA ouvre plusieurs postes d’enseignant·e·s-chercheur·e·s en informatique à temps complet, pour un recrutement au plus tard en début d’année scolaire 2022-2023,
en particulier sur les thématiques de la cybersécurité et des systèmes d’exploitation.

Afin d’accompagner la dynamique de développement de l’École à l’échelle nationale, les postes sont à pourvoir sur les sites de :

– Paris (Kremlin-Bicêtre et Campus Cyber à la Défense),
– Lyon,
– Rennes,
– Strasbourg,
– Toulouse.

Sujet :
Vous viendrez consolider nos équipes et axes de recherche sur les thématiques suivantes :

– Science et ingénierie des données, extraction de connaissances,
– Apprentissage automatique et autres sous-domaines de l’IA,
– Traitement d’images, reconnaissance des formes et vision,
– Automates et leurs applications (dont vérification et synthèse),
– Logiciel et performance (dont HPC, GPU),
– Sécurité des logiciels et des architectures : identification, protection, détection et réaction,
– Système bas-niveau (noyau, assembleur), systèmes d’exploitation, machines virtuelles et informatique en nuage,
– Système embarqué (dont robotique).

Profil du candidat :
Les informations précises concernant ces postes et le lien pour nous transférer votre dossier de candidature sont disponibles ici :

– https://www.lrde.epita.fr/~theo/postes_EPITA_MCF_2022.pdf pour les profils MCF,
– https://www.lrde.epita.fr/~theo/postes_EPITA_HDR_2022.pdf pour les profils HDR ou très bientôt HDR.

La date limite de candidature est le 15 avril 2022.

(La procédure de recrutement est lisible ici :
https://tinyurl.com/ProcedureRecrutementEPITA2022)

Formation et compétences requises :
Il n’est pas formellement nécessaire d’avoir la qualification aux postes de maître·sse de conférences ou de professeur·e des universités pour pouvoir postuler.

Document joint: profil McF.
Le descriptif du profil HDR est dans le lien ci-dessus.

Adresse d’emploi :
– Paris (Kremlin-Bicêtre et Campus Cyber à la Défense),
– Lyon,
– Rennes,
– Strasbourg,
– Toulouse.

Document attaché : 202203142010_postes_EPITA_MCF_2022.pdf

ComSciCon France 2022 (La communication scientifique pour les doctorants et doctorantes)

Date : 2022-07-07 => 2022-07-08
Lieu : Marseille, Campus saint-Charles

ComSciCon France, le workshop gratuit de formation à la communication scientifique à destination des doctorant·es de toutes disciplines, revient pour une 3ème édition les 7 et 8 juillet 2022 à Marseille, campus Saint-Charles.

Depuis 2020, deux premières éditions françaises se sont déroulées avec succès, rassemblant des intervenant·es et des doctorant·es des quatres coins de France. Ce workshop centré sur la pratique et l’interactivité (prise de parole en public, tables rondes, session d’écriture, ateliers vidéo/podcast/médiation…) fait appel à des acteurs et actrices reconnus de la communication scientifique. Durant les 2 jours du workshop, 40 doctorant·es tisseront des liens et développeront les outils nécessaires pour établir de nouveaux dialogues science-société.

Avec un projet de vulgarisation déjà en tête ou par simple curiosité, venez échanger avec les autres doctorant·es et nos intervenant·es. Une aventure unique vous attend !

Les candidatures seront ouvertes du 1er mars au 1er avril.

Plus d’informations sur https://france.comscicon.com !

Lien direct

Notre site web : www.madics.fr
Suivez-nous sur Tweeter : @GDR_MADICS
Pour vous désabonner de la liste, suivre ce lien.

Poste MCF 27ème section à La Rochelle Université

Offre en lien avec l’Action/le Réseau : – — –/– — –

Laboratoire/Entreprise : L3i
Durée : 45 ans
Contact : mickael.coustaty@univ-lr.fr
Date limite de publication : 2022-04-30

Contexte :
Un poste de maître.sse de conférences en section 27 est ouvert à La Rochelle Université.

Plus d’informations sur le profil recherché sont disponibles ici :

https://www.galaxie.enseignementsup-recherche.gouv.fr/ensup/ListesPostesPublies/ANTEE/2022_1/0171463Y/FOPC_0171463Y_4202.pdf

Sujet :
Voir fiche de poste

Profil du candidat :
Docteur en informatique

Formation et compétences requises :
Thèse de doctorat

Adresse d’emploi :
La Rochelle, France

Decentralized efficient AutoML Federated Learning for heterogeneous embedded devices

Offre en lien avec l’Action/le Réseau : – — –/– — –

Laboratoire/Entreprise : Orange Lab / UCA-Inria-CNRS MAASAI Team
Durée : 36 months
Contact : michel.riveill@univ-cotedazur.fr
Date limite de publication : 2022-03-14

Contexte :
TThe goal of the thesis is to perform research on decentralized and efficient federated AutoML learning for heterogeneous embedded devices.

The training of AI models for service delivery is today facing a conceptual transformation, by shifting the learning of models close to the data, embedded on users’ devices. These devices have limited resources and must remain fully operational during the learning phase. In addition, users today generate sensitive data and new collaborative algorithms for learning need to be developed and optimized for different embedded devices, ranging from smartphones to IoT.

Nowadays, to build an AI model it is necessary to collect data on a central server (cloud). The problems of this method are related to privacy, control of data usage and computational resources. Federated learning (FL) [1,2] is a new AI approach with collaborative training that resolves these problems. Models are trained on local users’ data and its parameters only are exchanged with other users to build a global model. The challenges of Federated Learning are (a) obtaining efficient and robust decentralized FL models with heterogeneous data (b) optimizing resources for actual operational deployment and (c) customizing services and optimizing model based on available resources for groups of users, because a single global model may be less explainable, accurate and appropriate when compared to a personalized model.

We will deploy deep neural networks on users’ devices because they have high classification/prediction accuracy in various tasks. However, their training requires a significant effort in terms of finding optimal hyperparameters, which limits their use at devices with constrained resources. Emerging areas address the problem of automatic neural network generation [3] and automatic search for appropriate architectures (Neural Architecture Search-NAS), features required for real-world deployments. FL NAS [4] aims at optimizing the architecture of neural network models in the FL environment. Many questions in this domain remain open. For example, there are no approaches developed for FL with clients having the same sample space and a different feature space.

Sujet :
The objective of the thesis is to (a) design a federated learning framework to automatically generate low-power neural networks in compliance with GDPR [5] with homogeneous (b) and heterogeneous devices under device constraints (availability, resources, states) and to study it in a fully decentralized Peer-to-Peer federated learning setup.

[1] J. Konecny, H. B. McMahan, F. X. Yu, P. Richtarik, A. T. Suresh and D. Bacon, “Federated Learning: Strategies for Improving Communication Efficiency,” in arXiv, 2017, pp. 1-10.

[2] K. Bonawitz, H. Eichner, W. Grieskamp, D. Huba, A. Ingerman, V. Ivanov, Ch. M. Kiddon, J. Konečný, S. Mazzocchi, B. McMahan, T. Van Overveldt, D. Petrou, D. Ramage and J. Roselander, “Towards Federated Learning at Scale: System Design,” SysML 2019, https://arxiv.org/abs/1902.01046, 2019.

[3] A. Wong, M. J. Shafiee, B. Chwyl and F. Li, “FermiNets: Learning generative machines to generate efficient neural networks via generative synthesis,” 1809.05989.pdf (arxiv.org), NIPS, 2018.

[4] H. Zhu, H. Zhang and Y. Jin, “From Federated Learning to Federated Neural Architecture Search,” https://arxiv.org/pdf/2009.05868.pdf, 2020.

[5] Regulation (EU) 2016/679 of the European Parliament and of the Council (article 30), https://eur-lex.europa.eu/legal-content/EN/TXT/HTML/?uri=CELEX:32016R0679&from=EN#d1e3265-1-1, Archived from the original on 28 June 2017.

Profil du candidat :
You have a Masters degree in Data Science or Computer Science and you are a curious person that likes to learn and seek for solutions. You are highly motivated to do your thesis in the emerging field of distributed algorithms for embedded devices. You have skills in machine learning, optimization and statistics (essential) as well as good programming skills and knowledge in the field of embedded devices (desirable). Interest in the field of Signal Processing is a plus.

Furthermore, autonomy and open-mindedness are the qualities particularly appreciated for research work. The dynamism, the strength of proposal and the capacities of communication are also required for this position. English will be used throughout the thesis (reading state of the art, writing articles and presenting results at international conferences) and excellent level of English is therefore required.

Formation et compétences requises :
Master or Ecole d’ingénieur.

Adresse d’emploi :
Orange Lab Sophia Antipolis.

Contact :
– Tamara.TOSIC@orange.com,
– Michel.RIVEILL@univ-cotedazur.fr

Approches de Traitement Automatique du Langage Naturel dans le domaine musical

Offre en lien avec l’Action/le Réseau : Musiscale/– — –

Laboratoire/Entreprise : CRIStAL, Inria, Université de Lille
Durée : 3 ans
Contact : louis.bigo@univ-lille.fr
Date limite de publication : 2022-03-14

Contexte :
Depuis une dizaine d’années, les réseaux de neurones profonds font l’objet de nombreuses recherches dans le domaine du traitement automatique du langage naturel (Natural Language Processing). Ces recherches ont de multiples applications allant de l’analyse de corpus à la génération automatique de contenu.

La nature temporelle de la musique encourage et facilite sa représentation sous la forme de séquences d’éléments à différentes échelles, généralement des accords ou des notes, comparables à des séquences de mots. Cette séquentialité, ainsi que l’assimilation courante de la musique à une sorte de langage, ont motivé l’utilisation d’outils originalement conçus pour des tâches de NLP pour le traitement automatique de données musicales (Music Information Retrieval) pour des tâches variées incluant l’analyse et la génération automatique de musique.

Sujet :
L’objectif central de cette thèse est d’évaluer l’adaptabilité, la performance et la pertinence de techniques de NLP lorsqu’elles sont appliquées sur des données musicales. On se concentrera en particulier sur l’application en musique de trois principes essentiels du NLP :

* le principe d’attention mutuelle (self-attention)
* la segmentation (tokenization)
* l’apprentissage par transfert (transfert learning)

Ces principes seront étudiés à travers l’entraînement et l’évaluation de modèles musicaux inspirés par des modèles majeurs de NLP incluant l’auto-encodeur BERT ou le modèle auto-regressif GPT. Une réflexion sera menée sur les limites de l’application de modèles de langage naturel sur des données musicales, d’un point de vue technique comme d’un point de vue épistémologique, et sur les perspectives de modèles originaux spécifiquement adaptés à la modélisation de données musicales.

Plus de détails sur la page : http://www.algomus.fr/jobs/phd-nlp-en/

Profil du candidat :
Fort intérêt pour l’apprentissage automatique, le traitement automatique du langage naturel, et la musique.

Formation et compétences requises :
* Master d’informatique ou équivalent, apprentissage automatique, traitement automatique du langage naturel
* Connaissances et pratique musicales souhaitées.

Adresse d’emploi :
CRIStAL, Inria, Université de Lille, Villeneuve d’Ascq

2 postes MdC à Lille

Offre en lien avec l’Action/le Réseau : – — –/– — –

Laboratoire/Entreprise : CRIStAL
Durée : Permanent
Contact : marc.tommasi@univ-lille.fr
Date limite de publication : 2022-03-31

Contexte :
Deux postes de maître de conférences en informatique sont ouverts à Lille en 2022. Les profils recherche proposent une intégration dans l’équipe Scool ou l’équipe Magnet.

Sujet :
Scool s’intéresse à la prise de décision séquentielle dans l’incertain, en particulier l’apprentissage par renforcement et les bandits. Scool aborde ces questions sous des angles qui vont du fondamental aux applications, en passant par la conception et l’étude d’algorithmes d’un point de vue théorique et d’un point de vue computationnel.
MAGNET s’intéresse à l’apprentissage statistique dans les graphes, l’apprentissage décentralisé, le traitement de la langue. Le profil porte plus particulièrement sur l’apprentissage machine, l’équité et le respect de la vie privée.

Profil du candidat :

Formation et compétences requises :

Adresse d’emploi :
Les candidatures non précédées d’un contact avec l’un des responsables d’équipes (Ph. Preux ou M. Tommasi) ne seront pas défendues.

Plus d’informations sont disponibles ici https://www.cristal.univ-lille.fr/?article25#EC et bien sûr n’hésitez pas à entrer également en contact avec le laboratoire, les structures d’enseignement.

MaDICS

Masses de Données, Informations et Connaissances en Sciences

Big Data - Data Science

Archives

link prediction in distributed knowledge graphs

Special issue Text Complexity and Simplification in Frontiers in Artificial Intelligence / Natural Language Processing

Appel à participation à la 8ème journée de l’AFIA : PDIA 2022 – IA et Créativité, 07 avril 2022, Paris

Coverage Measures for Machine Learning Enabled Cyber-Physical Systems

L’Epita recrute des Enseignants-Chercheurs

ComSciCon France 2022 (La communication scientifique pour les doctorants et doctorantes)

Poste MCF 27ème section à La Rochelle Université

Decentralized efficient AutoML Federated Learning for heterogeneous embedded devices

Approches de Traitement Automatique du Langage Naturel dans le domaine musical

2 postes MdC à Lille