1er appel à communications IC @ PFIA 2023

Date : 2022-12-05 => 2023-07-07
Lieu : Strasbourg, France.

============================================
1er appel à communications IC @ PFIA 2023
============================================
Appel à communication IC 2023 (34es Journées Francophones d’Ingénierie des Connaissances)
dans le cadre de la plateforme PFIA 2023 (Plate-Forme de l’Intelligence Artificielle)
du 03 au 07 juillet 2023, à Strasbourg, France.

—————————————-
Présentation de la conférence
—————————————-
Les journées francophones d’Ingénierie des Connaissances (IC) sont organisées chaque année depuis 1997, d’abord sous l’égide du Gracq (Groupe de Recherche en Acquisition des Connaissances) puis sous celle du collège SIC (Science de l’Ingénierie des Connaissances) de l’AFIA. Cette année encore, IC est hébergée par la plateforme PFIA, conjointement avec d’autres conférences francophones dans le domaine de l’intelligence artificielle (IA).

L’ingénierie des connaissances peut être vue comme la thématique de l’Intelligence Artificielle accompagnant l’évolution des sciences et technologies de l’information et de la communication qui engendrent des mutations dans les pratiques individuelles et collectives. Elle ambitionne de contribuer à son essor en développant les modèles, les méthodes et les outils pour l’acquisition, la représentation et l’intégration de connaissances afin de rendre possible leur exploitation dans des environnements informatiques aux caractéristiques variées. La représentation formelle de ces connaissances permet des raisonnements automatiques sur ces connaissances et sur les données qui leur sont associées, pouvant être complexes, hétérogènes et évolutives. Sa finalité est la production de systèmes capables d’aider l’humain dans ses activités et ses prises de décisions.

La conférence Ingénierie des Connaissances réunit la communauté francophone et est un lieu d’échanges et de réflexions, de présentation et de confrontation des théories, pratiques, méthodes et outils. Cette communauté doit désormais prendre en compte l’essor des algorithmes d’apprentissage et leurs retombées sur les pratiques individuelles et collectives, tout en conservant l’humain au centre des systèmes de données et connaissances.

—————————————–
Thèmes de la conférence
—————————————–
Les propositions portant sur le thème « apports des graphes de connaissances pour les approches neuro-symboliques d’apprentissage automatique dans l’ingénierie des connaissances » seront particulièrement bienvenues. Nous encourageons également les propositions de communication sur des travaux, originaux ou déjà publiés à l’international, ayant une portée théorique, méthodologique ou pratique, sur l’un des thèmes listés ci-dessous (liste non exhaustive) :

Ingénierie des connaissances pour le Web
Stockage et interrogation de connaissances distribuées
Web sémantique, Web des données, Web social, Web des objets
Représentation des connaissances, ontologies
Modèles de connaissances : conception, évolution, évaluation, exploitation, cycle de vie
Modélisation et formalisation : langages formels et informels, standardisation
Méthodes et outils pour l’ingénierie ontologique : alignement, intégration, modularité, fusion, métriques, patrons de conception, visualisation
Conception et réutilisation d’ontologies fondatrices, ontologies de core-domaine, ontologies de domaine, interopérabilité, terminologies

De la donnée à la connaissance
Extraction et acquisition de connaissances, peuplement d’ontologies, annotation sémantique
Acquisition de connaissances à partir de textes, à partir d’images, à partir de données non structurées, à partir d’interactions
Ingénierie des systèmes collaboratifs, crowd-sourcing
Traitements et raisonnements sur les connaissances
Ingénierie des connaissances et fouille de données

Qualité des données et des connaissances
Ingénierie des connaissances et données complexes : données multimédia, multilingues, temporelles, spatiales, multi-échelles, imprécises ou incertaines
Propriété et sécurité dans les systèmes à base de connaissances
Provenance et confiance dans les données, détection de vérité, incertitude
Métrique et évaluation de la qualité des données et connaissances

Raisonnement et apprentissage
Inférences et règles métiers
Raisonnement logique, approximations, raisonnement statistique, raisonnement par analogie, raisonnement à partir de cas, raisonnement dans les logiques non classiques
Calcul de plongements de graphes de connaissances
Apprentissage profond et graphes de connaissances

Applications de l’Ingénierie des Connaissances et retours d’expérience
Recherche d’Information, indexation, recommandation
Interaction Homme-Machine : visualisation de données, de connaissances et interconnexions, interface avec un système à base de connaissances, explications
Agents conversationnels
Systèmes de recommandation à base de connaissances
Adaptation, personnalisation : profils utilisateurs, modèles de contexte et adaptation, modèles d’émotion
Traitement de données massives, hétérogènes
Applications aux sciences de la vie, à l’agriculture, la culture, l’éducation, l’industrie, l’économie, le droit, l’informatique décisionnelle (BI), etc.

—————————————-
Dates importantes
—————————————-
Soumission des articles : 1er mars 2023
Notification aux auteurs : 15 avril 2023
Réception des versions définitives : 15 mai 2023
Dates de la conférence : du 03 au 07 juillet 2023

—————————————-
Soumissions
—————————————-
L’appel à contributions de l’édition 2023 de la conférence IC comporte plusieurs types de communications :

Articles de recherche originaux (académiques ou applicatifs/industriels)
– Articles longs présentant des travaux originaux et validés (au maximum 10 pages références comprises, présentation orale 20 min, discussion 10 min)
– Articles courts présentant des travaux originaux ayant des résultats préliminaires (au maximum 6 pages références comprises, présentation orale 15 min, discussion 5 min)
– Posters et démonstrations accompagnés de résumés de 4 pages maximum références comprises (présentation pendant les séances posters/démos de la plateforme). Pour les démonstrations il est recommandé d’ajouter un lien dans le résumé vers une vidéo de démonstration de l’outil/logiciel.

Articles de positionnement
– Articles de positionnement apportant une rétrospective sur les travaux en lien avec un domaine bien identifié en lien avec les thématiques de la conférence, et proposant un point de vue sur les prochains verrous scientifiques importants de ce domaine (au maximum 6 pages références comprises, présentation orale 15 min, discussion 10 min)

Articles de recherche déjà publiés
– Articles déjà publiés dans de conférences ou revues internationales mais inédits en français. La soumission, obligatoirement en français avec une référence vers l’article publié (au maximum 2 pages références comprises).

Un prix du meilleur article sera décerné par le comité de programme pendant la conférence.

—————————————-
Comités
—————————————-
Présidente du comité de programme : Cassia Trojahn (Université Toulouse 2 Jean Jaurès, IRIT)
Comité de programme : en cours de constitution
Président du comité d’organisation : Thomas Guyet (INRIA)

Lien direct

Notre site web : www.madics.fr
Suivez-nous sur Tweeter : @GDR_MADICS
Pour vous désabonner de la liste, suivre ce lien.

PhD Defense: Explainable Classification of Uncertain Time Series

Date : 2022-12-13
Lieu : ISIMA, Salle du conseil (A102) and visio

Hello,

I hope you are doing well.

I have the great pleasure to invite you to my PhD defense entitled Explainable Classification of Uncertain Time Series. The defense will take place on the 13th of December 2022 at 2 pm in room A102 (Salle du Conseil) at ISIMA. You are also invited to share some drinks and candies after the defense in the room A104 right after the defense.

How to attend remotely?There will be two channels to attend the defense remotely:
– By Microsoft Teams using this link: https://teams.microsoft.com/l/meetup-join/19%3ameeting_YWNmMDQ1MDAtYWFlOC00MDNjLWE3NTMtNjY5ODkxOTVhMDFm%40thread.v2/0?context=%7b%22Tid%22%3a%225a16bd04-b475-49ff-b11a-c6c8359db1b1%22%2c%22Oid%22%3a%22949eb4b9-6120-456f-95a8-6ec37948db76%22%7d
– By YouTube using this link: https://youtu.be/EW1Wp3Fg-1Q. Feel free to leave a thumb up if you like the presentation and a thumb down if you did not. I will also be happy to read any comment you may have about the presentation.

Here is the abstract of the presentation: Time series classification is one of the most studied theoretical and applied fields of time series analysis. Many classical machine learning as well as deep learning algorithms, have been developed during the last decade to accurately perform time series classification. However, the case where the time series are uncertain is still under-explored. In this work, we discuss the importance of uncertainty handling in machine learning in general and in time series classification in particular. We propose efficient, robust and explainable methods for the classification of uncertain time series. We assess our methods on simulated datasets, but also on a real scenario in the astrophysics in which uncertainty in preponderant. The results we obtained are understandable and trustable by astronomers. Our proposed methods are tools that will facilitate the understanding of the universe in which we life in particular, and the field of uncertain time classification in general.

Here is the composition of the Jury:
Anthony BAGNALL (R) – University of East AngliaSebastien DESTERCKE (R) – Heudiasyc, University of Technology of Compiegne

Elisa FROMONT (E) – IRISA, University of Rennes 1Emmanuel GANGLER (E) – LPC, University Clermont AuvergneDavid HILL (E) – LIMOS, University Clermont Auvergne
Themis PALPANAS (E) – LIPADE, Universite Paris CiteEngelbert
MEPHU NGUIFO (A) – LIMOS, University Clermont Auvergne(R): Reviewer, (E): Examinator, (A): Advisor

I am looking forward to defending my work in front of you.

Best regards

Lien direct

Notre site web : www.madics.fr
Suivez-nous sur Tweeter : @GDR_MADICS
Pour vous désabonner de la liste, suivre ce lien.

ADBIS 2023 – call for tutorial proposals

Date : 2022-12-13
Lieu : Barcelona, Spain

Call for tutorials

ADBIS 2023 invites submissions for tutorial proposals on all topics of potential interest to the conference attendees. Tutorial proposals should cover state-of-the-art research, development, and applications in specific data management or information systems related areas, and stimulate and facilitate future work. Proposals must provide an in-depth survey of the selected topic with the option of describing specific works in depth.

The topic of the tutorial should be broad enough to attract a significant audience and must include enough details to provide a sense of both the scope of the material to be covered and the depth to which it will be covered. Tutorials on interdisciplinary directions, bridging scientific research and applied communities, novel and fast-growing directions, and significant applications, as well as tutorials with hands-on, are highly encouraged.

Important Dates

All deadlines below are AOE

Submission deadline: April 20, 2023
Notification: May 15, 2023
Camera-ready abstract overview due: June 15, 2023
Slides availability: September 3, 2023

Submission Guidelines

Tutorial submissions must be submitted electronically, in pdf format, to each of the tutorial chairs.

Submissions should be formatted using the LNCS style templates, with a maximum length of 8 pages, inclusive of ALL material. Any submitted paper violating the length, file type, or formatting requirements will be desk rejected.

Tutorials will be selected based on technical quality, significance of the topics, relevance to ADBIS.
Originality will be considered a plus. Accepted tutorials will be considered for publication in the conference or workshop proceedings.

Proposals should include:

Title of the tutorial
Names, affiliations and email addresses of the presenters
Overview of tutorial, with justification of its relevance and timeliness
Target audience and assumed background
Related recent tutorials and how the proposed tutorial is different or novel compared to those
Scope and structure: enough detail to provide a sense of both the scope of material to be covered and the depth to which it will be covered
Brief professional biographies of presenters, with a note on their background in the area of the tutorial

Authors of accepted tutorials are encouraged to provide their own recording of the tutorial, for dissemination purpose via the conference website. In any case, the presenters are expected to be there at the live event to give the tutorial – not just play a pre-recorded video.

Tutorial Chairs
Patrick Marcel
Boris Novikov

Lien direct

Notre site web : www.madics.fr
Suivez-nous sur Tweeter : @GDR_MADICS
Pour vous désabonner de la liste, suivre ce lien.

FL-Day – Decentralized Federated Learning: Approaches and Challenges

Date : 2022-12-13
Lieu : Université Paris-Saclay
Amphithéâtre du bâtiment Digiteo (LISN)
Campus Universitaire, Rue Raimond Castaing bâtiment 650
91190 Gif-sur-Yvette

L’équipe ADAM du Laboratoire DAVID et l’Institut DATAIA co-organisent un workshop sur la thématique du Federated Learning, qui aura lieu à l’Amphithéâtre du bâtiment Digiteo (LISN) le mardi 10 janvier 2023 (Maps).

Inscription obligatoire & gratuite (dans la limite des places disponibles)
lien ici: https://www.dataia.eu/evenements/workshop-fl-day-decentralized-federated-learning-approaches-and-challenges

La journée abordera à travers plusieurs présentations, les problématiques liées à la thématique « Decentralized Federated Learning », de l’apprentissage automatique, au traitement de données décentralisées (Edge Computing) ou encore de la protection des données « privacy » dans un contexte décentralisé avec des illustrations dans différents domaines. Les présentations seront suivies d’une table ronde.

Les participants qui le souhaitent sont invités à proposer des Posters pour exposer leurs travaux pendant les pauses, en l’envoyant aux organisateurs ci-dessous :
Karine ZEITOUNI – karine.zeitouni@uvsq.fr
Zaineb CHELLY – zaineb.chelly-dagdia@uvsq.fr
Mustapha LEBBAH – mustapha.lebbah@uvsq.fr

Un buffet déjeunatoire ainsi que des pauses gourmandes seront prévus lors de cette journée.
==
Conférenciers invités :
==
AURÉLIEN BELLET – DR INRIA LILLE, ÉQUIPE CRISTAL

Titre : Better Privacy Guarantees for Decentralized Federated Learning
Résumé : Les algorithmes entièrement décentralisés, dans lesquels les participants échangent des messages de pair à pair le long des bords d’un graphe de réseau, sont de plus en plus populaires dans l’apprentissage fédéré en raison de leur évolutivité et de leur efficacité. Intuitivement, les algorithmes décentralisés devraient également offrir de meilleures garanties de confidentialité, puisque les nœuds n’observent que les messages envoyés par leurs voisins dans le graphe. Mais formaliser et quantifier ce gain est un défi : les résultats existants se limitent à des garanties de confidentialité différentielle locale (LDP) qui négligent les avantages de la décentralisation. Dans cet exposé, je présenterai des relaxations appropriées de la confidentialité différentielle et montrerai comment elles peuvent être utilisées pour montrer des garanties de confidentialité plus fortes pour le SGD décentralisé, correspondant au compromis confidentialité-utilité du SGD centralisé dans certains contextes. Il est intéressant de noter que certains de ces algorithmes amplifient les garanties de confidentialité en fonction de la distance entre les nœuds du graphe, ce qui correspond bien aux attentes des utilisateurs en matière de confidentialité dans certains cas d’utilisation.
—
SONIA BENMOKHTAR – DR CNRS, LIRIS, LYON

Titre : Decentralized Learning (as an enabler) for Decentralized Online Services

Résumé : Il y a un fort élan vers les services basés sur les données à tous les niveaux de la société et de l’industrie. Cela a commencé par des applications Web à grande échelle telles que les moteurs de recherche Web (par exemple, Google, Bing), les réseaux sociaux (par exemple, Facebook, Twitter) et les systèmes de recommandation (par exemple, Amazon, Netflix) et devient de plus en plus omniprésent grâce à l’adoption de dispositifs portables et à l’avènement de l’Internet des objets. Tous ces services sont rendus possibles par la disponibilité de grandes infrastructures de calcul, de forts progrès en matière d’intelligence artificielle (IA) et en particulier d’apprentissage automatique, et la possibilité de collecter et d’agréger de grandes quantités de données sur les utilisateurs, leurs environnements et leurs organisations dans des infrastructures de cloud. Mais si les progrès de l’IA/ML et des infrastructures distribuées ont été considérables, les applications axées sur les données rendues possibles par ces avancées posent des problèmes importants en ce qui concerne le respect de la vie privée de leurs utilisateurs et peuvent engendrer des menaces telles que la censure, la perte de contrôle des données personnelles et les fuites de données. Plus récemment, des initiatives telles que le Web 3.0 promettent de décentraliser les services en ligne, au cœur desquels l’IA/ML joue un rôle crucial pour donner aux utilisateurs la possibilité de reprendre le contrôle de leurs données personnelles et empêcher une poignée d’acteurs économiques de trop concentrer le pouvoir de décision.
—
HAKIM HACID – PRINCIPAL RESEARCHER, TII, ABU DHABI, UAE (GROUPE AIDRC)

Titre : Towards Edge AI: Principles, current state, and perspectives

Résumé : La communauté de l’intelligence artificielle (IA) a beaucoup investi pour développer des techniques capables de digérer de très grandes quantités de données pour en extraire des informations et des connaissances à valeur ajoutée. La plupart des techniques, en particulier les modèles d’apprentissage profond, nécessitent une grande puissance de calcul et de stockage, ce qui les rend appropriées aux environnements basés sur le cloud. L’intelligence est donc éloignée de l’utilisateur final, ce qui soulève des inquiétudes concernant, par exemple, la confidentialité des données et la latence. L’IA de périphérie vient apporter des solutions à certains problèmes inhérents au nuage et se concentre sur les meilleures pratiques, architectures et processus pour étendre l’IA des données en dehors du nuage. L’IA de périphérie rapproche l’IA de l’utilisateur final et utilise, par exemple, moins de ressources de communication, car le traitement est effectué directement sur le périphérique de périphérie. Cet exposé présentera l’IA de périphérie et donnera un aperçu des travaux existants et des futures pistes de contribution potentielles.

Au plaisir de vous y retrouver nombreux !

Bien cordialement,
Le comité d’organisation

Lien direct

Notre site web : www.madics.fr
Suivez-nous sur Tweeter : @GDR_MADICS
Pour vous désabonner de la liste, suivre ce lien.

24 months post-doctoral position: Deep learning strategies to model complex systems

Offre en lien avec l’Action/le Réseau : – — –/– — –

Laboratoire/Entreprise : IMT Atlantique
Durée : 24 months
Contact : carlos.granero-belinchon@imt-atlantique.fr
Date limite de publication : 2023-04-01

Contexte :
This project is multidisciplinary and focuses on the development of new Deep Learning models for non-linear multiscale description of complex systems. Applications in different topics such as fluid turbulence, remote sensing and ocean dynamics can be considered.

Sujet :
The main objective of this postdoc position is the formulation of multiscale DL models able to extract non-linear couplings. Moreover, we want this models to 1) be based on the physics of the studied system, and so to have a physics guided learning, and 2) to be interpretable from a physics point of view. With this purpose both the loss function and the model architectures will be adapted. Moreover, in order to emulate chaotic and complex systems a stochastic component will be included and the incertitudes of the reconstructed states quantified.

Thus, this project aims to reconstruct the unknown states of the studied complex system from physical knowledge of the system and available data that can be spatially distant, prior in time, at coarser resolution etc. We can then envisage physics-informed super-resolution, data generation and forecasting (Fablet et al. 2021) among other applications.

For example in the case of ocean modelling, the multiscale and non-linear nature of ocean surface dynamics plays a fundamental role in biogeochemical, ecological and climatic processes and consequently its characterization is a main topic in the current oceanographic research. Today the ocean dynamics can be studied through a large variety of remote sensing images of the ocean surface (Yahia et al. 2010, Renosh et al. 2015, Qiu et al. 2020) as well as from numerical simulations (Lellouche et al. 2021).

Profil du candidat :
Candidates are required to have a PhD in Deep Learning/Machine learning with strong experience in Neural Networks. The candidate must have passed at least 18 months in a non-French laboratory between May 1, 2019 and the start of the project.

Formation et compétences requises :
Good skills in python, pytorch, pytorch lightning are also required, as well as a background in teamwork. Previous experience in a multidisciplinary research team will also be considered as positive. Ideally (but not necessarily), the candidate will have previous experience in fluid physics and/or oceanography.

Adresse d’emploi :
The Postdoc will work in collaboration with Carlos Granero-Belinchon and Ronan Fablet from IMT Atlantique, Simon van Gennip from Mercator Ocean International, and Bertrand Chapron from Ifremer. Thus, the research team is composed by physicist, oceanographers and artificial intelligence researchers from different laboratories, leading to a multidisciplinary project. Moreover, the postdoc will develop within the OSE research team at IMT (https://cia-oceanix.github.io/) which is a dynamic research group on image processing and artificial intelligence for Oceanography and Climate. The postdoc will also be part of the new Inria team Odissey (https://team.inria.fr/odyssey/).

The post-doctoral position is a two-year full-time appointment starting during 2023. Gross salary will depend on the experience of the candidate, up to approx. 55,000 €/year (net salary: up to approx. 30,000 €/year). The candidate will also benefit from French social insurance, and will have up to 45 days of annual leave. The candidate will be able to benefit up to 90 days of remote working per year.
The candidate will be based at the IMT Atlantique Campus (Brest) in a dynamic and stimulating working environment at five minutes walking from the beach.
Within the framework of the ANR JCJC project SCALES the postdoc will have funding for participation in conferences, publication fees and visits to external laboratories. Moreover, within the framework of the ANR Chair OCEANIX the postdoc will have access to compute servers : Datarmor and servers from OSE at IMT Atlantique.
Teaching activities at IMT Atlantique will also be proposed to the postdoc, mainly in signal processing, computer vision and artificial intelligence. These actvities, which imply an additional salary, will not be mandatory.
Motivated candidates should send a CV and a motivation letter to: carlos.granero-belinchon@imt-atlantique.fr.

The Postdoc is expected to start during 2023.

Document attaché : 202212021346_Postdoc_ANR-SAD_v2.pdf

Data pipelines in the cloud: elastic execution with dynamic parallelism

Offre en lien avec l’Action/le Réseau : – — –/Innovation

Laboratoire/Entreprise : LIP6/Sorbonne Université et SAP France
Durée : 6 mois
Contact : bernd.amann@lip6.fr
Date limite de publication : 2023-04-01

Contexte :
Nowadays, institutions and companies manage their data with a wide variety of applications which were not designed to communicate with each other. On the other hand, there is a very strong need to design new data management and analysis services that will add value to the data that is there. Since it is practically impossible to migrate all applications and their data into an integrated system, the current solution is to build analytic data pipelines to facilitate the data flow between operations that perform complex processing, including collecting data from multiple sources, transforming it, generating AI models through learning, and storing it in multiple destinations. In practice, a data pipeline can contain hundreds of operations, and it can evolve repeatedly by being populated with new operations or new data. Thus, with the increasing number of pipelines to be designed and deployed, it is crucial to dispose of high level data pipeline definition languages, tools to deploy and control the execution of data pipelines and efficient solutions to optimize the execution of complex operations on large volumes of data.

In this context, SAP has developed the SAP Data Intelligence (DI) software for the automatic con- figuration and deployment of data pipelines. These pipelines use a flow-based programming model [3]. Each pipeline operation corresponds to a program (Python, node.JS, …) or a call to an external API (e.g., Spark job) that is deployed using an adapted Docker [2] image/container. Kubernetes services provide deployment and orchestration of these images on hyperscaler platforms like AWS, Google Cloud, Azure etc.

A performance problem arises at large scale when a pipeline contains long operations processing massive data. A first solution was designed in the context of an SAP/LIP6 internship to parallelize operators [4]. In this solution, the way to consume/produce data is described using data sorting and partitioning functions. This allows the data to be partitioned and distributed to process operators in parallel. The principle of the method is to first define the properties of a “divide and conquer” mapping in the JSON configuration of an operator. These properties allow to automatically transform a DI pipeline into a new parallelized DI pipeline with several replicas (identical copies) of the initial operator, each running in parallel on different parts of the operator’s input data. A “dispatch” operator is injected into the data pipeline to split the input data stream into different partitions and a “collect” operator is injected to aggregate the output of the replicas into a single output. The replicas are aggregated into a single output data stream. The first experiments show that this parallelization solution allows improving the performance of data pipelines, but does not allow obtaining optimal performance in real environments, which need to estimate and to dynamically adapt the operator replication/data parallelization degree in relation to the volumes of data exchanged, the calculations performed and the available resources.

Sujet :
The objective of this internship is to propose new methods to facilitate and optimize the deployment and execution of parallelized data pipelines. This raises several scientific and technical challenges:

• Estimating the replication degree: How many replicas should be deployed for each operation to be processed in parallel? To answer this question, we need to estimate the benefit of parallel processing as a function of the number of replicas, the amount of data to be processed and the CPU consumption of an operation. This benefit must also be related to the cost of using the machines running data pipelines in the cloud, in order to determine an optimal number of replicas for a certain budget.

• Elastic deployment: How can we adapt the number of replicas to dynamic changes in available resources and associated costs? This demands for new solutions to allow the number of replicas (degree of parallelism) of an operator to be dynamically changed without interrupting the pipeline.

Internship goals and tasks

Internship #1. The goal of the first internship is to evaluate the performance of the parallelization method on different types of stateful operators by varying the CPU load of the operator, the size of the operators state, and the size of the messages dispatched to the replicas. The evaluation will be run on a Kubernetes cluster deployed on a hyperscaler platform. Through this evaluation, we expect to learn the configuration parameters that provide the greatest parallelization benefit and some suggestions for improving the parallelization method.

Tasks:

• Propose a model to estimate the overhead incurred by adding operations that partition data and distribute it to replicas in the pipeline.

• Design a method to observe the execution of the pipeline and detect an overload (underload) situation.

• Determine the new degree of parallelism that will improve pipeline performance.

Internship #2. The goal of the second internship is to implement dynamic dispatch and collect operators which automatically adapt to the scaling up or down of the number of replicas of a parallelized operator. For the dispatch operator, the strategy must guarantee that no message is lost in case of scaling down. For the collect operator, the strategy must guarantee that all messages produced by the replicas are properly collected and possibly re-ordered in case of scaling up.

Tasks:

• Design a technical solution to dynamically change the number of running operator replicas and adapting the dispatch and collect operators.

• Conduct experiments using data pipeline examples to check the validity of the implemented strategies and measure their possible overhead.

The solutions will be deployed in the SAP DI environment. Comparative experiments will be implemented on the Spark parallel computing platform. For this, a solution will be designed to transform the pipeline description (written with Data Intelligence syntax) into a Spark pipeline [1] (pyspark syntax).

References

[1] Michael Armbrust, Reynold S. Xin, Cheng Lian, Yin Huai, Davies Liu, Joseph K. Bradley, Xiangrui Meng, Tomer Kaftan, Michael J. Franklin, Ali Ghodsi, and Matei Zaharia. Spark SQL: relational data processing in spark. In ACM SIGMOD International Conference on Management of Data, pages 1383–1394, 2015.

[2] David Bernstein. Containers and cloud: From lxc to docker to kubernetes. IEEE Cloud Computing, 1(3):81–84, 2014.

[3] Tanmaya Mahapatra. High-level graphical programming for big data applications. Master’s thesis, Technische Universität München (TUM), 2019.

[4] Ludgy Vestris. Scaling up stateful and order preserving operators in DI data pipelines. Master’s thesis, CNAM, SAP – LIP6, 2022.

Profil du candidat :
The candidate should have excellent experience in algorithmic and programming (Python, Java) and advanced knowledge of optimization and parallelization techniques (query optimization, data parallelism, map-reduce, ….) and some technical knowledge of Docker/Kubernetes is also helpful. To apply, you should send to the three co-supervisors (see email above), a CV and the grades of the last three semesters of study.

Formation et compétences requises :
Dernière année de Master ou d’École d’ingénieur

Adresse d’emploi :
• SAP France (Levallois-Perret)
• Equipe Bases de Données du LIP6 (Paris): http://www-bd.lip6.fr/

Document attaché : 202212021339_Stage_LIP6_SAP_2023-3.pdf

EDM 2023: the 16th International Conference on Educational Data Mining

Date : 2023-07-11 => 2023-07-14
Lieu : Bangalore, India

16th International Conference on Educational Data Mining (EDM 2023)
Bangalore, July 11-14, 2023

Call for Papers

It is a pleasure to invite you to Educational Data Mining (EDM 2023). Educational Data Mining is a leading international forum for high-quality research that mines datasets to answer educational research questions, including exploring how people learn and how they teach. The overarching goal of the Educational Data Mining research community is to support learners and teachers more effectively, by developing data-driven understandings of the learning and teaching processes in a wide variety of contexts and for diverse learners. The conference will take place in the Indian Institute of Science Campus, Bengaluru, India, during July 11-14, 2023.

The theme of this year’s conference is “Educational data mining for amplifying human potential”. EDM-2023 particularly welcomes papers focusing on concepts, principles, and techniques mined from educational data for enhancing the potential of all the stakeholders in the education system. Papers describing applications and case studies are especially welcome.

Topics of Interest
– Models and new techniques for mining educational data
– Domain Knowledge Modeling
– Educational Recommenders, Instructional Sequencing, and Personalized Learning
– Learner Cognitive and Behavior Modeling and its association with performance
– Learner Knowledge and Performance Modeling
– Social and Collaborative Learning
– Reproducibility
– Equity, Privacy, Transparency, and Fairness

Important Dates (anywhere on Earth)
– Workshop and Tutorial proposals: December 5, 2022
– Full/short papers abstract: Jan 13, 2023
– Full/short papers, industry papers, posters and demos, doctoral consortium papers: Jan 20, 2023

For any inquiries regarding the program, please contact: edm2023.conf@gmail.com
We look forward to seeing you at EDM 2023.

Sincerely,

—
EDM2023 Program Committee

Lien direct

Notre site web : www.madics.fr
Suivez-nous sur Tweeter : @GDR_MADICS
Pour vous désabonner de la liste, suivre ce lien.

PAKDD 2023 in Osaka: Call for Papers (Dec 7) and Workshops (Dec 1)

Date : 2023-07-11 => 2023-05-28
Lieu : Osaka, Japan

===============================================================

[Call for Papers]

The 27th Pacific-Asia Conference on Knowledge Discovery and Data Mining

(PAKDD 2023)

http://pakdd2023.org/

Conference date: May 25-28, 2023 – Osaka, Japan (Onsite/online hybrid)

Paper Submission Deadline: Dec. 7, 2022

===============================================================

The Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) is one of the longest established and leading international conferences in the areas of data mining and knowledge discovery. It provides an international forum for researchers and industry practitioners to share their new ideas, original research results, and practical development experiences from all KDD related areas, including data mining, data warehousing, machine learning, artificial intelligence, databases, statistics, knowledge engineering, visualization, decision-making systems, and the emerging applications.

The 27th edition of PAKDD will be held in Osaka, Japan, from May 25 to May 28, 2023.

The venue will be a hybrid of onsite and online.

[Topics]

PAKDD2023 welcomes high-quality, original, and previously unpublished submissions in the theories, technologies, and applications on all aspects of knowledge discovery and data mining. Topics of relevance for the conference include, but not limited to the following.

Methods and algorithms:

Anomaly and outlier detection, Association rule, Classification, Clustering, Data mining pipelines, Deep learning, Dimensionality detection and feature selection, Ethics and fairness, Graphs and networks, Interpretability and explainability, Kernel methods, Matrices and tensors, Online and streaming algorithms, Parallel and distributed mining, Probabilistic models and statistical inference, Regression, Reinforcement learning, Relational learning, Security and privacy, Semi-supervised and unsupervised learning, Theoretical foundations, Transfer learning and meta learning, and Visualization and user interface.

Applications:

Big data, Computational Advertising, Financial data, Information retrieval and search, Internet of Things, Intrusion and fraud detection, Medical and biological data, Multimedia and multimodal data, Recommender systems, Robotics, Scientific data, Social network analysis, Spatio-temporal data, Texts, web, social media, and Time-series and streaming data.

[Paper Submission]

Paper submission must be in English. All papers will be double-blind reviewed by the Program Committee based on technical quality, relevance to data mining, originality, significance, and clarity. All paper submissions will be handled electronically. Papers that do not comply with the Submission Policy will be rejected without review.

Each submitted paper must include an abstract up to 200 words and be no longer than 12 single-spaced pages with 10pt font size (including references, appendices, etc.). Authors must use Springer LNCS/LNAI manuscript submission guidelines for their submissions. All papers must be submitted electronically through the CMT paper submission system in PDF format only. Supplementary material may be submitted as a separate PDF file, but reviewers are not obligated to consider this, and your manuscript should, therefore, stand on its own merits without any supplementary material. Supplementary material will not be published in the proceedings.

We require that any submission to PAKDD must not be already published or under review at another archival conference or journal. Papers on arXiv do not violate this rule as long as the submitted paper does not cite them. Submitting a paper to the conference means that if the paper was accepted, at least one author will complete the regular registration and attend the conference to present the paper. For no-show authors, their papers will not be included in the proceedings.

The conference will confer several awards, including Best Paper Award, Best Student Paper Award, and Best Application Paper Award from the submissions.

Springer will publish the proceedings of the conference as a volume of the LNAI series.

[Double-Blind Review]

Paper submission must adhere to the double-blind review policy. Submissions must remove all details identifying the author(s) from the original manuscript (including the supplementary files, if any), and the author(s) should refer to their prior work in the third person and include all relevant citations.

Because of the double-blind review process, non-anonymous papers that have been issued as technical reports or similar cannot be considered for PAKDD2023. An exception to this rule applies to manuscripts that were published in arXiv not later than October 24, 2022, i.e., at least a month before PAKDD’s submission deadline.

The author list and order cannot be changed after the paper is submitted.

[Formatting Template]

Formatting Template: https://www.springer.com/gp/computer-science/lncs/conference-proceedings-guidelines.

All the Manuscripts must be prepared and submitted in accordance with the above format. Usage of other formats may lead to disqualification of paper for the conference.

[Submission Site]

https://cmt3.research.microsoft.com/PAKDD2023

[Important Dates]

Paper Submission Deadline: Dec 7, 2022

Paper Acceptance Notification: Feb 7, 2023

Camera Ready Papers Due: Mar 10, 2023

*All deadlines are 23:59 Pacific Standard Time (PST)

[Contact Information]

Program Co-Chairs of PAKDD2023

Hisashi Kashima, Wen-Chih Peng, Tsuyoshi Ide

pakdd2023@gmail.com

PAKDD 2023 Call for Workshops
https://pakdd2023.org/cfw/

The Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) is one of the longest established and leading international conferences in the areas of data mining and knowledge discovery. The 26th edition of PAKDD will be held in Osaka, Japan, from May 25 to May 28, 2023. The PAKDD 2023 organizing committee invites workshop proposals on foundational and emerging topics in areas related to data mining. The PAKDD workshops provide an informal and vibrant opportunity for researchers and industry practitioners to share their research positions, original research results and practical development experiences on specific challenges and emerging issues. Each workshop should be focused on a cohesive theme so that participants can benefit from interaction with each other.

## Topics of Interest
Workshop topics typically match those identified in the PAKDD 2023 call for papers, but proposals concerned with other areas of data mining and knowledge discovery are welcome. Interdisciplinary workshops that explore the convergence of data mining and knowledge discovery with various disciplines are also encouraged.

## Format
Workshops are scheduled to be held at the beginning of the conference, May 25, 2023. Workshops will be held entirely online. Workshop papers will not be included in the conference proceedings but available on the PAKDD 2023 webpage.

## Duties
The organizers of accepted workshops are expected to disseminate the call for papers, gather submissions, form the program committees, conduct the reviewing process, and decide upon the final workshop program.

## Submission
The workshop proposal should contain the following information:
– Title of the workshop – Objectives, scope, and contribution to the main conference
– Names, affiliations and contacts of the organizers
– Tentative list of the program committee members
– Length of the workshop (full day or half day)
– Expected number of submissions and attendees
Workshop proposals should be submitted by December 1st, 2022 at 11:59PM (PST). Please prepare a PDF (maximum three pages) that contains the aforementioned contents and send it to pakdd2023workshop@gmail.com.

## Important Dates
Workshop proposal submission deadline: December 1, 2022
Workshop proposal acceptance notification: December 21, 2022
Workshop CFP deadline: January 15, 2023
Workshop paper deadline: February 28, 2023
Workshop paper acceptance notification: March 31, 2023
Workshop paper camera-ready: April 25, 2023
*All deadlines are 23:59 Pacific Standard Time (PST)

## Contact Information
Workshop Co-Chairs of PAKDD2023
Yukino Baba, Jill-Jênn Vie
pakdd2023workshop@gmail.com

—
Jill-Jênn
https://jjv.ie

Lien direct

Notre site web : www.madics.fr
Suivez-nous sur Tweeter : @GDR_MADICS
Pour vous désabonner de la liste, suivre ce lien.

Implantation d’une interface utilisateur pour l’exploration interactive d’un ensemble de motifs extraits

Offre en lien avec l’Action/le Réseau : DSChem/– — –

Laboratoire/Entreprise : Groupe de recherche en informatique, image, automa
Durée : 6 mois
Contact : bertrand.cuissart@unicaen.fr
Date limite de publication : 2023-02-01

Contexte :

Cette annonce propose un stage de 6 mois destiné à un·e étudiant·e en cinquième année d’informatique (M2 ou école d’ingénieurs). Le stage s’inscrit dans le projet ANR-20-CE23-0023 InvolvD 1 . Le sujet principal est le développement d’une interface utilisat·eur·rice,outil indispensable pour permettre aux expert·e·s en pharmacie de bénéficier de notre nouvel algorithme d’exploration de résultats expérimentaux. Le stage (sera encadré par Ronan Bureau, Bertrand Cuissart et Etienne Lehembre. Le·La stagiaire sera accueilli.e au sein du GREYC, le laboratoire d’informatique de l’Université de Caen Normandie.

Sujet :
Dans le cadre d’InvolvD, nous avons récemment développé un algorithme visant à accom-
pagner un·e expert·e dans la découverte d’un espace de données structurées. L’algorithme
aillant fait ses preuves lors de tests avec des oracles synthétiques, nous souhaitons passer
à une phase expérimentale concrète en impliquant des expert·e·s humain·e·s. Le stage
consiste à réaliser l’interface utilisat·eur·rice qui permettra à un·e pharmacien·e de choisir
les parties des résultats qui l’intéressent au premier chef.
Les éléments d’étude sont des graphes étiquetés appelés pharmacophores [2] issus d’un
calcul de fouille de données. L’ensemble des pharmacophores est structuré grâce à la re-
lation d’inclusion entre les graphes. L’interface a pour but d’offrir à l’expert·e un moyen
efficace de parcourir cette structure de façon à alimenter l’algorithme d’apprentissage
par renforcement. L’objectif étant de limiter la frustration et le manque d’attention de
l’expert·e [1], il est important que l’interaction ne se résume pas à une suite de questions
– réponses.
Suite aux réponses de de l’expert·e, l’algorithme actualise l’intérêt de chaque pharma-
cophore pour l’analyse. Pour traduire cette évolution, la visualisation de la structure à
parcourir doit évoluer en conséquence.
La réalisation de l’interface sera prolongée par un travail destiné à évaluer les perfor-
mances de l’algorithme d’évaluation de l’intérêt des pharmacophores. Pour cette partie
du travail, il sera indispensable d’avoir un échange de nature interdisciplinaire avec les
chercheurs en pharmacie.
Enfin, le stage se conclura par un travail plus ouvert et axé sur le choix stratégique à
associer au parcours des pharmacophores. On peut privilégier une stratégie d’exploitation
associée à un parcours plutôt de proches en proches, on peut opter pour une stratégie
d’exploration qui privilégie les pharmacophores associés aux endroits peu explorés par
l’analyse ou on peut imaginer des compromis entre ces deux stratégies.

Profil du candidat :
Technologies envisagées
La réalisation de l’interface s’appuiera sur une méthode clas-
sique MVC (Modèle – Vue – Contrôleur) où le modèle serait le code C++ fourni. Il sera
nécessaire d’intégrer le code à un wrapper Python pour mettre en place les contrôleurs
communiquant avec la vue qui utilisera Dash Cytoscape 2,3 ; Cytoscape est un logiciel de
visualisation de graphes déjà existant.
Apports du stage
Le·la stagiaire sera intégré·e au sein de l’équipe CODAG du GREYC, laboratoire d’informatique normand. Le projet ANR InvolvD impliquant des cherch·eurs·euses de plusieurs laboratoires français, l’étudiant·e aura l’occasion d’échanger avec plusieurs spécialistes dans le contexte d’une recherche interdisciplinaire. Ces échanges seront accompagnés par une intégration dans le groupe chimie-informatique caennais, groupe qui compte une vingtaine de membres et qui se réunit mensuellement pour échanger. L’étudiant·e aura ainsi plusieurs occasions de présenter ses travaux dans un contexte collaboratif. De plus, le travail étant un travail de recherche académique, il se concrétisera par la rédaction d’une communication scientifique qui sera soumise à la communauté scientifique (poster, workshop, article de conférence, ou journal). Enfin, en réalisant ce stage, l’étudiant·e va acquérir des connaissances très intéressantes concernant la ”chemoinformatique”, domaine interdisciplinaire visant à réaliser des avancées informatiques pour mieux appréhender le monde de la chimie.

Formation et compétences requises :
Le stage est destiné à un étudiant de cinquième année d’informatique (M2 ou école d’ingénieurs).

Adresse d’emploi :
6 Boulevard du Maréchal Juin
Bâtiment Sciences 3
CS 14032, 14032 CAEN cedex 5

Document attaché : 202212011043_stage_greyc.pdf

ingénieur interopérabilité et ingénièrie de la connaissance

Offre en lien avec l’Action/le Réseau : – — –/– — –

Laboratoire/Entreprise : Mission science ouverte de l’IRD (Institut de rec
Durée : 2 ans
Contact : jean-christophe.desconnets@ird.fr
Date limite de publication : 2023-02-01

Contexte :
Une mission attractive
Sous la responsabilité de Jean Christophe Desconnets, ingénieur à l’IRD, votre mission consistera à prolonger les travaux initiés sur la constitution des référentiels terminologiques et d autre part de travailler à l’amélioration du moissonnage des données et métadonnées des SI scientifiques pour constituer une première base de connaissance des productions numériques scientifiques de l ’IRD.

Sujet :
Vos activités seront les suivantes :

Continuer de construire les référentiels disciplinaires de l’IRD. L’accent sera en particulier mis sur la formalisation et l ’alignement avec les référentiels géographiques nationaux et internationaux.
Développer des routines pour regrouper, formaliser et structurer les métadonnées scientifiques selon les standards du web en vigueur afin de les interconnecter et les interroger.
Participer à l’ urbanisation des SI en définissant avec les parties prenantes les exigences pour la conception ou la création des bases de données institutionnelles.
S’impliquer dans la communauté des sciences des données pour assurer une veille technologique, échanger avec les autres instituts dans le cadre des initiatives nationales et internationales CoSO, RDA

Profil du candidat :
Vous possédez un diplôme de niveau 7-8 (école d’ingénieur ou doctorat) dans les domaines des sciences des données ou ingénierie de la connaissance

Formation et compétences requises :
Très bonne maitrise des concepts, méthodes et outils liés à la modélisation de données et de connaissances.
Spécialisation en construction et curation de référentiels terminologiques/ontologies.
Connaissance des technologies du web sémantique (concepts, langages).
Maîtrise d’outils de construction, d’alignements ou d’agrégation d’ontologies.
Connaissance des principaux standards d’interopérabilité sur les métadonnées et les données fortement souhaitée.
Rigueur et capacité d’analyse, en particulier pour les travaux de structuration de la connaissance ou d’alignement de référentiels.
Connaissances opérationnelles de la langue anglaise (B2-C1).
Savoir élaborer des dispositifs pédagogiques pour former ou informer des scientifiques.
Maîtriser les notions de conduite de projet, savoir gérer une organisation.
Vous faites preuve des qualités humaines suivantes :

Bonnes qualités relationnelles et d’écoute afin d’être en mesure de participer pleinement aux groupes de travail internationaux.
Travailler en équipes inter-disciplinaires ou inter-organismes.
Savoir transmettre et faire preuve de pédagogie.
Curiosité et capacité à appréhender un domaine nouveau, en évolution.

Adresse d’emploi :
Délégation régionale Occitanie de l’IRD- Montpellier

MaDICS

Masses de Données, Informations et Connaissances en Sciences

Big Data - Data Science

Archives

1er appel à communications IC @ PFIA 2023

PhD Defense: Explainable Classification of Uncertain Time Series

ADBIS 2023 – call for tutorial proposals

FL-Day – Decentralized Federated Learning: Approaches and Challenges

24 months post-doctoral position: Deep learning strategies to model complex systems

Data pipelines in the cloud: elastic execution with dynamic parallelism

EDM 2023: the 16th International Conference on Educational Data Mining

PAKDD 2023 in Osaka: Call for Papers (Dec 7) and Workshops (Dec 1)

Implantation d’une interface utilisateur pour l’exploration interactive d’un ensemble de motifs extraits

ingénieur interopérabilité et ingénièrie de la connaissance