On Capturing and Using Provenance in Machine Learning Pipelines

Offre en lien avec l’Action/le Réseau : – — –/Doctorants

Laboratoire/Entreprise : LAMSADE
Durée : 5 à 6 mois
Contact : kbelhajj@googlemail.com
Date limite de publication : 2022-03-24

Contexte :
Machine learning pipelines are designed to generate predictive models given some raw data. Learned models are then utilized to make predictions given some (unseen) observations. The predictive power of the learned model depends largely on the data sets used for trained and how they have been preprocessed (engineered). ML-pipeline developers tend to rely mainly on their skills, past experience, and an iterative try-and-fail process to refine and improve ML.

Sujet :
We seek to investigate how provenance information can be utilized to improve the process whereby ML-pipelines are designed and refined. In particular, the sub-tasks of the internships are as follows:
*T1*. A sweep of the state-of-the-art of provenance in data preprocessing and machine learning.
*T2*. Identifying techniques for the collection and utilization of provenance with the view to assist ML developers in the task of designing, improving, and debugging ML pipelines.
*T3*. The implementation of a prototype, and it is validation in the context of real-world ML pipeline.

Profil du candidat :
The candidate must be a Master student or an engineering student in his/her final year of study. To apply, send your CV, a letter of motivation and transcripts of the last three years to kbelhajj@gmail.com and daniela.grigori@lamsade.dauphine.fr

Formation et compétences requises :
Familiarity with data processing as well as unsupervised and supervised machine learning algorithms

Adresse d’emploi :
Univertsité Paris Dauphine, Place du Maréchal De Lattre de Tassigny, 75016, Paris

Document attaché : 202202240950_Internship-MLPipelinesProvenance.pdf

Intégration d’une méthode d’explicabilité pour l’analyse d’opinions sur les médias sociaux

Offre en lien avec l’Action/le Réseau : – — –/– — –

Laboratoire/Entreprise : ETIS UMR 8051
Durée : 6 mois
Contact : maria.malek@cyu.fr
Date limite de publication : 2022-02-24

Contexte :
Nous explorons dans nos travaux actuels portant sur l’analyse des médias sociaux, la combinaison de méthodes classiques d’exploration d’opinion avec l’analyse des réseaux sociaux et son impact sur la formation et la propagation d’opinion afin de construire un modèle d’opinion cohérent.
Afin d’étudier l’impact des utilisateurs influents (nœuds influents), nous intégrons dans un premier temps plusieurs facteurs d’influence extraits du réseau dans le processus d’exploration d’opinions. Ces facteurs sont généralement calculés en utilisant différentes mesures de centralité comme le degré, la proximité, l’intermédiarité, la centralité PageRank, etc.

Nous définissons et étudions ensuite la notion de la stabilité d’opinion au sein des réseaux égocentriques autour des influenceurs et au sein des communautés détectées, notre objectif étant de détecter la modification d’opinion pour les deux types de sous-réseaux.
Nous analysons les communautés obtenues afin de comprendre les opinions émergeantes à partir de ces communautés non seulement en fonction des profils utilisateurs mais aussi en fonction d’éléments topologiques. Nous souhaitons également proposer des indicateurs concernant la stabilité des opinions et d’autres liés à leurs changements.

Sujet :
Le but de stage est de proposer et d’intégrer une méthode d’explicabilité dans les algorithmes d’analyse d’opinions afin de produire des explications émergeantes qui combinent des informations nodales (comme le profil d’utilisateur) et topologiques extraites de la structure du graphe de propagation des opinions.
En intégrant une méthode d’explicabilité adéquate, nous souhaitons rendre plus compréhensible également les résultats concernant la polarité de l’opinion trouvée au niveau des utilisateurs et au niveau des groupes. De même, le modèle doit être capable d’expliquer les changements d’opinion détectés en lien avec les informations extraites du réseau de propagation et les séquences d’actions entreprises (par exemple : tweets, retweets, réponses) menant à ce changement.

Profil du candidat :
Master 2 ou dernière année d’école d’ingénieur

Formation et compétences requises :
Bonne connaissance en Machine Learning et en programmation Python.

Adresse d’emploi :
2 Av. Adolphe Chauvin, 95300 Pontoise, bâtiment A, 5 étage étage, laboratoire ETIS.

Document attaché : 202202231431_Stage_M2_ETIS_Explicabilite_AnalyseOpinions.pdf

Ph.D. Position: Learning Spatio-temporal data by graph representations

Offre en lien avec l’Action/le Réseau : – — –/– — –

Laboratoire/Entreprise : LIFAT EA 6300 (lifat.univ-tours.fr)
Durée : 3 ans
Contact : donatello.conte@univ-tours.fr
Date limite de publication : 2022-02-24

Contexte :
The thesis will be part and funded (gross salary: 2 000 € approximately) under the ANR project CodeGNN (http://www.normastic.fr/projet-anr-codegnn/)
The PhD could start around October 2022

Supervisors and contact
Donatello Conte (University of Tours, France) donatello.conte@univ-tours.fr
Sébastien Bougleux (Université de Caen Normandie, France) bougleux@unicaen.fr
Nicolas Ragot (University of Tours, France) nicolas.ragot@univ-tours.fr

Ph.D. Position (10/2022): Learning Spatio-temporal data by graph representations

Sujet :
In many application domains like action recognition or prediction, video segmentation, traffic forecasting or anomaly detection in brain activity signals, time-varying data are frequently represented by graphs. Two main representations are commonly considered: a temporal sequence of graphs or a spatio-temporal graph connecting graph nodes through time. While there is a solid literature on data analysis based on such representations, the domain has strongly evolved over the last 5 years with the advances in deep learning on Graph Neural Networks.
Such methods have been less investigated for time-varying graphs, particularly when both the graph structure and the data attached to this structure are varying.
We can distinguish two main models: Recurrent Neural Networks (RNN) combined with spatial convolutions rely on the sequential representation [1, 2, 3]; or Graph Convolutional Networks alternating temporal and spatial convolutions [4, 5, 6].
The aim of this thesis is to:
1. Study new representations for spatio-temporal graphs: we want to investigate some new representations in two main directions: representing temporal data as attributes of nodes and edges, and representing temporal data as edge connections between spatial positions represented by nodes at different times.
2. Propose new Neural Network architectures for data represented by this kind of graphs: we want to propose adapted convolutions, decimation and pooling, and study the definition of a recurrent neural network that operates directly in the space of the graphs (for example generating new graphs). One direction of study will also be the Spatial- Temporal Graph Attention Networks (STGAT [7]) and Graph Transformer Networks (GTN [8]).
3. Program these models (in Python), and compare them to the state-of-the-art on standard datasets for different applications, in particular, skeleton-based gesture recognition.

References
[1] A. Jain, A. R. Zamir, S. Savarese, and A. Saxena, “Structural-rnn: Deep learning on spatio- temporal graphs,” CoRR, vol. abs/1511.05298, 2015
[2] C. Si, W. Chen, W. Wang, L. Wang, and T. Tan, “An attention enhanced graph convolutional lstm network for skeleton-based action recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1227–1236, 2019
[3] Li, M., Chen, S., Zhao, Y., Zhang, Y., Wang, Y., & Tian, Q. (2020). Dynamic multiscale graph neural networks for 3d skeleton based human motion prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 214-223).
[4] B. Yu, H. Yin, and Z. Zhu, “Spatio-temporal graph convolutional neural network: A deep learning framework for traffic forecasting,” CoRR, vol. abs/1709.04875, 2017.
[5] L. Shi, Y. Zhang, J. Cheng, and H. Lu, “Skeleton-based action recognition with directed graph neural networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7912–7921, 2019.
[6] Chen, T., Zhou, D., Wang, J., Wang, S., Guan, Y., He, X., & Ding, E. (2021). Learning Multi-Granular Spatio-Temporal Graph Network for Skeleton-based Action Recognition. arXiv preprint arXiv:2108.04536.
[7] Kong, X., Xing, W., Wei, X., Bao, P., Zhang, J., & Lu, W. (2020). STGAT: Spatial-temporal graph attention networks for traffic flow forecasting. IEEE Access, 8, 134363-134372.
[8] Yun, S., Jeong, M., Kim, R., Kang, J., & Kim, H. J. (2019). Graph transformer networks. Advances in neural information processing systems, 32.

Profil du candidat :
– Master degree in Computer Science, Applied Mathematics, Data Science, or similar.

Formation et compétences requises :
– strong background in computer science and maths
– experiences in neural networks, deep learning, Python programming,
numerical analysis will be privileged
– knowledge in video and image analysis would be appreciated
– good communication skills and reporting, autonomy and curiosity

Adresse d’emploi :
LIFAT, 64 Avenue Jean Portalis, 37200 Tours

Document attaché : 202202231030_PhD_Thesis_Proposal_SpatioTemporalGraphs.pdf

Journées du GdR MAGIS

Date : 2022-03-21 => 2022-03-23
Lieu : Grenoble, Campus de Saint-Martin-d’Hères
Bâtiment IMAG

Renouvelé en 2022 pour 5 ans par ses deux instituts de rattachement INS2I et INSHS, et avec le soutien de l’INEE, le GdR CNRS MAGIS réunit 350 chercheurs et ingénieurs de 55 unités de recherche qui travaillent au carrefour de l’informatique, de la géographie, des sciences environnementales.

Ces journées sont un moment privilégié pour prendre connaissance des feuilles de routes des 5 chantiers transversaux et des 14 groupes de travail (appelés Actions de Recherche) qui rythmeront l’activité du GdR pendant les 5 prochaines années.

Cette manifestation est organisée autour d’une session plénière qui se déroulera en présentiel (lundi 21 mars après-midi et mardi 22 mars toute la journée) et d’ateliers – organisés en présentiel et en mode hybride pour certains – qui se tiendront en amont ou en aval de la plénière.

Le programme est accessible ici

Les inscriptions pour la plénière et les ateliers se font sur ce site également. Notez que l’inscription est gratuite mais obligatoire pour des raisons logistiques.

Lien direct

Notre site web : www.madics.fr
Suivez-nous sur Tweeter : @GDR_MADICS
Pour vous désabonner de la liste, suivre ce lien.

Optimized Performance Techniques for Next Generation Satellite Communication Networks

Offre en lien avec l’Action/le Réseau : – — –/Doctorants

Laboratoire/Entreprise : Institut Fresnel
Durée : 3 ans
Contact : andre@fresnel.fr
Date limite de publication : 2022-02-24

Contexte :
With the rise of Internet-of-Things (IoT) applications and the need for massive connectivity, future 6G networks should meet the demands for the global access to high-speed Internet [1]. One of the envisaged solutions consists in deploying non-terrestrial networks such as networks of satellites or microsatellites in the low Earth orbit (LEO). Such satellites have a much lower manufacturing and launch costs than the traditional satellites, such as those placed in the geostationary orbit. Such very high-throughput satellite (VHTS) networks will be able to meet the future substantial data traffic requirements [1,2]. The specificity of these satellites (or microsatellites) is that they have limited capacities and resources (energy, computing, etc.). However, they are more flexible in terms of resource management, such as power and bandwidth allocation. Another particularity of such networks is the irregular distribution of users (on the Ground) and the variability of connections and, therefore, the data traffic over time. This calls for energy efficient and high-speed connectivity solutions for inter-satellite and satellite-to-ground links. In particular, the use of laser communications or free-space optics (FSO) technology promises high rate and secure data transmission over very large distances [3].

Sujet :
In practice, the establishment of such links is associated with several challenges in terms of (a) link availability/reliability and (b) resource management at the satellite. Indeed, the irregular distribution of users (on the Ground) and the variability of data traffic during the day appeal for the design of efficient architectures with flexible resource allocation according to the requested traffic [6].

(a) The first objective is to propose advanced transmission techniques to establish high-speed communication links with high-reliability between microsatellites or between a microsatellite and a Ground station [7]. These solutions must in particular take into account the atmospheric
channel and the vibrations of the payloads, which can cause significant pointing errors (i.e., misalignment between the transmitter and the receiver) [4]. This first step includes the modeling of optical communication channels and will be carried out in collaboration with the
University of Edinburgh.

(b) In a second step, machine learning-based mechanisms will be designed for performing automated resource allocation in order to increase the capacity of satellite-Earth links [8-9]. This will exploit the flexibility of microsatellites in terms of resource management, such as
power and bandwidth allocation.

For more details, see the attached file.

Profil du candidat :
A solid background in signal processing is an important asset. An experience or training in digital communications is also very welcome. The candidate must have a very good English language proficiency (oral and written expression) and be keen for short-term stays in partner laboratories.

Formation et compétences requises :
Master/engineering school in signal processing, telecommunication, data sciences, statistics, mathematics, computer sciences …

Adresse d’emploi :
52 Av. Escadrille Normandie Niemen, 13013 Marseille

Document attaché : 202202221036_Thesis-SatCom-FSO-English(1).pdf

Point cloud based large-scale place recognition (IGN, Paris area)

Offre en lien avec l’Action/le Réseau : – — –/– — –

Laboratoire/Entreprise : LaSTIG
Durée : 36 mois
Contact : valerie.gouet@ign.fr
Date limite de publication : 2022-04-08

Contexte :
Thesis proposal: Point cloud based large-scale place recognition – Application to the prevention against fake news

*** Subject of the thesis

The thesis project focuses on 3D point cloud based large-scale place recognition, with the application of geolocation of 3D image data. Without any extra information of the initial position, geolocalazing image content relies on the indexing and retrieval of content similarities in a geolocalized reference. This thesis proposes to study this type of approach by exploiting 3D maps based on acquisition campaigns (in particular LiDAR) that are becoming mainstream thanks to high quality geometry reconstruction which makes them attractive, but also complex to handle given their volume and diversity. Please consult the full text in PDF for the description of the subject thesis.

*** Context

The fields of application of place recognition from images are numerous, we will deal here with the case of the geolocation of amateur video sequences as a certification tool for the prevention against fake news. Massively spread on social networks and on the web, amateur videos relaying information or an event are now very important, with among them content that is fake news, i.e. taken outside of its original context, to express bad or false information. To fight against this form of misinformation, several media, such as the French public television channel “France TV”, have set up a fact checking unit of images and videos which analyzes, verifies and certifies these streams. This complex work is done by hand and would benefit from being automated by using artificial intelligence tools. The verification of geolocation was recognized as essential to best explain what is happening. It is in this collaborative context between IGN and France TV that we focus on this geolocation criterion with the desire to exploit the best georeferencing repositories of today to offer automatic large-scale geolocation solutions, which can, among other things, contribute to the fact checking of visual information.

Sujet :
Full description in English: https://www.umr-lastig.fr/vgouet/News/annonce_these_PlaceReco3D_2022-EN.pdf

Full description in French: https://www.umr-lastig.fr/vgouet/News/annonce_these_PlaceReco3D_2022-FR.pdf

Profil du candidat :
*** Candidate profile

Bac+5 in computer science, applied mathematics or geomatics (master or engineering school).

Please note the only students from the European Union, the United Kingdom or Switzerland are eligible for this thesis project.

*** How to apply

Before March 28, 2022, please send both contacts in a single PDF file the following documents:
– A detailed CV
– A topic-focused cover letter
– Grades and ranks over the last 3 years of study
– The contact details of 2 referents who can recommend you

*** Contacts

– Laurent Caraffa – Laurent.Caraffa@ign.fr, Researcher at LaSTIG (thesis supervisor), IGN, Gustave Eiffel University
– Valérie Gouet-Brunet – Valerie.Gouet@ign.fr, Research director at LaSTIG (director of the thesis), IGN, Gustave Eiffel University

Formation et compétences requises :
A good background in machine learning is required, and a knowledge on 3D computer vision or image indexing will be appreciated. The successful candidate must have good programming skills (Python, C/C++). Although fluency in French is not required, fluency in English is necessary. Curiosity, open-mindedness, creativity, perseverance and the ability to work in a team are also key personal skills in demand.

Adresse d’emploi :
*** Organization

* Start: last quarter of 2022

* Place: the thesis will be carried out in Paris area at the LaSTIG laboratory, located in Saint-Mandé (73 avenue de Paris, Saint-Mandé metro, line 1) in the premises of the IGN. The doctoral student will be attached to the MSTIC Doctoral School (ED 532).

The French mapping agency IGN (National Institute for Geographic and Forest Information) is a public administrative establishment attached to the French Ministry of Ecological Transition; it is the national reference operator for mapping the French territory. The LaSTIG Laboratory in Sciences and Technologies of Geographic Information for the smart city and sustainable territories, is a joint research unit attached to the Gustave Eiffel University, the IGN and the School of Engineering of the city of Paris (EIVP). It is a unique research structure in France and even in Europe, bringing together around 80 researchers, who cover the entire life cycle of geographic or spatial data, from its acquisition to its visualization, including its modeling, integration and analysis; among them about thirty researchers work in image analysis, computer vision, machine learning, photogrammetry and remote sensing. LaSTIG researchers can be involved in the teaching activities of the IGN engineering school, the ENSG (Ecole Nationale des Sciences Géographiques), which offers access to undergraduate and graduate students with excellent quality in fields related to geographic information sciences: geodesy, photogrammetry, computer vision, remote sensing, spatial analysis, cartography, etc.

Document attaché : 202202212218_annonce_these_PlaceReco3D_2022-EN.pdf

MCF 27ème section en délégation à l’Université de la Nouvelle Calédonie

Offre en lien avec l’Action/le Réseau : – — –/– — –

Laboratoire/Entreprise : Institut des Sciences Exactes et Appliquées (ISEA)
Durée : 2ans renouvelable
Contact : nazha.selmaoui@univ-nc.nc
Date limite de publication : 2022-04-08

Contexte :
Recrutement d’un MCF en délégation à l’Université de la Nouvelle Calédonie.

Sujet :
voir le profil dans le document joint.

Profil du candidat :
La personne recrut e aura un profil recherche lié à l’apprentissage machine, la fouille de données, la science de données, le big data et applications, ainsi qu’une polyvalence en ce qui concerne l’enseignement.

Formation et compétences requises :
être déjà titulaire dans l’enseignement supérieur.

Adresse d’emploi :
Nouméa, Nouvelle Calédonie

Document attaché : 202202212158_MCF-CNU-27-D-prolongation-28.02.2022.pdf

Handling classes’ imbalance in supervised classification for medical diagnostics

Offre en lien avec l’Action/le Réseau : – — –/– — –

Laboratoire/Entreprise : LAMADE – Pôle Sciences des Données – Université P
Durée : 5-6 months
Contact : sana.mrabet@dauphine.psl.eu
Date limite de publication : 2022-03-05

Contexte :
The classification of highly imbalanced data is a big challenge for machine learning techniques. To deal with this challenge, many solutions have been proposed that could be classified in three categories: data pre-processing with under/oversampling technique that creates a training sample with a new instances distribution, active sampling that changes the training sampling throw the learning process, and the Synthetic Minority Over-sampling Technique (SMOTE) that creates new synthetic instances in the minority class. The efficiency of each approach depends on the context. For the medical diagnostics, if the input data contains categorical attributes, the SMOTE methods could be not suitable. Otherwise, if the data imbalance ratio is high, using the under/oversampling could induce loss of information in the training sample

Sujet :
Study and compare three different approaches to handle classes’ imbalance in medical data: data pre-processing with over/under sampling, synthetic minority over-sampling and active sampling.

Profil du candidat :
Master 2 ou dernière année d’école d’ingénieur en informatique

Formation et compétences requises :
Bonne connaissance en Machine Learning et en programmation Python.
Maîtrise de l’anglais et bonne capacité rédactionnelle

Adresse d’emploi :
Université Paris Dauphine – PSL
Place du Maréchal de Lattre de Tassigny – 75775 PARIS Cedex 16

Document attaché : 202202211348_Proposition sujet mémoire 2022.pdf

Deep neural network compression using tensor methods

Offre en lien avec l’Action/le Réseau : MACLEAN/– — –

Laboratoire/Entreprise : laboratoire d’informatique et systèmes (LIS) UMR
Durée : 5 to 6 months
Contact : zniyed@univ-tln.fr
Date limite de publication : 2022-04-30

Contexte :
Deep Neural Networks (DNNs) demonstrate good prediction performances in numerous applications. However, the architectures of neural networks are very large, reaching several million parameters, and running them on systems with limited computing capacity (embedded systems) becomes a difficult task. For this reason, we will focus in this internship project on the compression of DNNs by tensor methods.

Sujet :
This internship project deals with the study of new compression techniques for deep neural networks, by resorting to tensor decompositions to model and factorize the DNN weights. Recent studies show that DNN weight matrices are often redundant, and by restricting their ranks, it is possible to significantly reduce the number of parameters without a significant drop in performance. In this project, we propose to convert these matrices to a tensorial format and to use multidimensional data processing methods to compress them. The goal of this internship is to study different tensor representations, such as the canonical polyadic decomposition (CPD) or Tucker decomposition (TD), for the compression of the converted multidimensional weights. Specifically, we will study the compactness of these representations and their impact on the predictive accuracy of DNNs. In a first stage, the intern student will review the existing state-of-the-art tensor-based compression techniques and will get familiar with the tensor decompositions. Then, we will compare different representations with the goal to improve them and propose new tensor-based scheme for DNN compression.

This internship can be followed by a Ph.D research project, starting October, 2022, at LIS, Toulon

Profil du candidat :
M2R or engineering school students with major in signal processing, machine learning or applied mathematics.

Formation et compétences requises :
Good python programming skills are required. The knowledge of deep learning frameworks is a desirable plus. The candidate should have good writing and oral communication skills.

Adresse d’emploi :
The intern student will join the Signal and Image (SIIM) research team at the LIS laboratory, Toulon.
The internship will be supervised by Yassine Zniyed (Associate Professor at Université de Toulon) and Thanh Phuong Nguyen (Associate Professor/HDR at Université de Toulon).

Document attaché : 202202210815_Stage_M2R_2022.pdf

1st call-for-participation JOKER@CLEF: Automatic Wordplay and Humour Translation Task

Date : 2022-04-22

Deadlines

Data & guidelines release: February – March 2022

Run submission: 22 April 2022

Draft paper submission: 27 May 2022

CLEF conference: 5–8 September 2022

Context

Humour remains one of the most difficult aspects of intercultural communication: understanding humour often requires understanding implicit cultural references and/or double meanings, and this raises the question of its (un)translatability. Wordplay is a common source of humour due to its attention-getting and subversive character. The translation of humour and wordplay is therefore in high demand. Modern translation depends heavily on technological aids, yet few works have treated the automation of humour and wordplay translation, or the creation of humour corpora. The goal of the JOKER workshop is to bring together translators and computer scientists to work on an evaluation framework for wordplay, including data and metric development, and to foster work on automatic methods for wordplay translation.

Tasks

We invite you to submit both automatic and manual runs! Manual intervention should be reported.

Task 1: Сlassify and explain instances of wordplay.

Task 2: Translate single words containing wordplay.

Task 3: Translate entire phrases containing wordplay.

Unshared task: We welcome any other type of submission that uses our data as an open task.

How to participate
Sign up at the CLEF website (https://clef2022-labs-registration.dei.unipd.it/). All team members should join the JOKER mailing list (https://groups.google.com/u/4/g/joker-project). The data will be made available to all registered participants.

Contacts

JOKER website: http://joker-project.com/

CLEF website:
https://clef2022.clef-initiative.eu/index.php

Registration: https://clef2022-labs-registration.dei.unipd.it/

Email: contact@joker-project.com

Twitter: https://twitter.com/joker_research

Google Group: https://groups.google.com/u/4/g/joker-project

Notre site web : www.madics.fr
Suivez-nous sur Tweeter : @GDR_MADICS
Pour vous désabonner de la liste, suivre ce lien.

MaDICS

Masses de Données, Informations et Connaissances en Sciences

Big Data - Data Science

Archives

On Capturing and Using Provenance in Machine Learning Pipelines

Intégration d’une méthode d’explicabilité pour l’analyse d’opinions sur les médias sociaux

Ph.D. Position: Learning Spatio-temporal data by graph representations

Journées du GdR MAGIS

Optimized Performance Techniques for Next Generation Satellite Communication Networks

Point cloud based large-scale place recognition (IGN, Paris area)

MCF 27ème section en délégation à l’Université de la Nouvelle Calédonie

Handling classes’ imbalance in supervised classification for medical diagnostics

Deep neural network compression using tensor methods

1st call-for-participation JOKER@CLEF: Automatic Wordplay and Humour Translation Task