MaDICS

PhD position on Design of transparency mechanisms for online targeted advertising at LIG (within the MIAI 3IA institute)

Sep 30 – Oct 1 all-day

Annonce en lien avec l’Action/le Réseau : aucun

Laboratoire/Entreprise : LIG
Durée : 3 ans
Contact : patrick.loiseau@univ-grenoble-alpes.fr
Date limite de publication : 2019-09-30

Contexte :
The Facebook advertising platform has been the source of a number of controversies in recent years regarding privacy violations, lack of transparency on how it provides information to users about the ads they see, and lately, Facebook’s ability to be used by dishonest actors for discriminatory advertising or ad-driven political propaganda to influence elections.

This situation has led many governments and privacy advocates to push Facebook to make its platform more transparent and more accountable for the ads that circulate on it and push for laws requiring transparency. For example, the General Data Protection Regulation (GDPR) of the EU mentions a “right to explanation”. However, how to make such systems more transparent is an open question. Indeed, in a recent work [0,5] we showed that the transparency mechanisms provided by Facebook in the “why am I seeing this ad?” button hide key reasons for showing ads; and the way these explanations are designed allow advertisers to easily obfuscate explanations from ad campaigns that are discriminatory or that target privacy-sensitive attributes.

Sujet :
The goal of the PhD thesis is to study the sources of risks in social media advertising and design transparency mechanisms to reduce these risks. The PhD candidate will be able to investigate various directions:

1. How to provide explanations without the collaboration of the advertising platform. The idea is to reverse-engineer the targeting formula in order to infer why an ad has been targeted to a particular person. The idea is to use statistics and machine learning techniques to group together people that receive the same ad/ads and study their most predominant properties.

2. What information is necessary for users/regulators/news medias to have access to in order to identify misbehaving advertisers that are, for example, trying to send misinformation, their messages are duplicitous or they are building discriminatory ad campaigns.

3. What are the properties of explanations that makes them robust to malicious attackers that try to avoid detection. For example, if an explanation is not complete (does not show all the targeting attributes used by the advertiser), an advertiser could hide that his ad campaigns are discriminatory.

The student will be able to work with more than 200k real-world ads received by more than 1000 users we collected using our browser extension AdAnalyst (www.adanalyst.mpi-sws.org). Throughout the project the student will be able to familiarize himself with the online targeted advertising ecosystems, and apply machine learning techniques on real world data. The student will also participate at the maintenance of AdAnalyst and will be encouraged to implement the transparency mechanisms proposed in AdAnalyst.

Profil du candidat :
Candidates should hold (or be about to get) a MSc degree in computer science and have: • Strong coding skills.
• Experience in working with data.
• Strong motivation.
• Interest in the societal impact of advertising platforms.

Formation et compétences requises :
Candidates should hold (or be about to get) a MSc degree in computer science and have: • Strong coding skills.
• Experience in working with data.
• Strong motivation.
• Interest in the societal impact of advertising platforms.

Adresse d’emploi :
LIG, Bâtiment IMAG
700 av Centrale, Domaine Universitaire
38400 Saint Martin d’Hères

Document attaché : 2019_PhD_ads_transparency.pdf

Categories: theses

PhD position on Measuring and preventing the impact of harmful online ads on children and teenagers at LIG (within the MIAI 3IA institute)

Sep 30 – Oct 1 all-day

Annonce en lien avec l’Action/le Réseau : aucun

Laboratoire/Entreprise : LIG
Durée : 3 ans
Contact : patrick.loiseau@univ-grenoble-alpes.fr
Date limite de publication : 2019-09-30

Contexte :
Children and teenagers live in an incredibly digitalize world. Technology and the Internet facilitates children’s right to access information, to learn and to lead a social life. However these benefits and opportunities must be balanced with mechanisms to protect their rights to be free from economic exploitation and to have their privacy respected.

Despite the importance of protecting these vulnerable segments of our population, little is known about what data advertising platforms are collecting about children and teenagers, and how advertisers are using this data to target them with ads. In a recent report WHO emphasizes that “childhood obesity and marketing of unhealthy products are among the main concerns” and that “digital marketing of these products is a new, global public health challenge that needs to be urgently tackled”[5].

Sujet :
The goal of the PhD thesis is to analyze the ads children and teenagers receive, assess their impact through controlled experiments, and propose transparency mechanisms that are able to surface harmful online ads and reduce their impact. The student will approach the work in 4 steps:

1. Tool development: Instagram and YouTube are the two platforms most used by children and teenagers. The student will need to build a tool similar to AdAnalyst (adanalyst.mpi-sws.org) that collects the ads users see in their Instagram feeds and when they watch Youtube videos. The student will be able to get advice and help from the PhD students that are currently developing AdAnalyst for their research.

2. Analysis of data: The student will first use data mining and natural language techniques to identify harmful online ads (e.g., ads that promote unhealthy foods). The student will thereafter analyze: (1) What is the extent and nature of children’s exposure to ads for unhealthy foods/drinks?; (2) What is the extent and nature of children’s engagement with food/drink advertising? (3) How do food and beverage marketers target/reach children with advertising in digital media?

3. Controlled experiments: We will promote the tool in high schools and we will target volunteers with well crafted ads to assess the impact that various ads have on teenagers.

4. Design of transparency mechanisms: The end goal of the thesis will be to design mechanisms to increase the transparency in online advertising and allow children/teenagers and regulators to detect harmful practices.

Profil du candidat :
Candidates should hold (or be about to get) a MSc degree in computer science and have:
• Strong coding skills. Experience with coding in mobile platforms is a plus.
• Experience in working with data.
• Strong motivation.
• Interest in the societal impact of advertising platforms.

Formation et compétences requises :
Candidates should hold (or be about to get) a MSc degree in computer science and have:
• Strong coding skills. Experience with coding in mobile platforms is a plus.
• Experience in working with data.
• Strong motivation.
• Interest in the societal impact of advertising platforms.

Adresse d’emploi :
LIG, Batiment IMAG
700 av Centrale, Domaine Universitaire
38400 Saint Martin d’Hères

Document attaché : 2019_PhD_ads_kids.pdf

Categories: theses

PhD position: Statistical learning theory and hybrid dynamical system identification

Sep 30 – Oct 1 all-day

Annonce en lien avec l’Action/le Réseau : aucun

Laboratoire/Entreprise : LORIA/CRAN, Nancy
Durée : 3 years
Contact : fabien.lauer@loria.fr
Date limite de publication : 2019-09-30

Contexte :
Automatic control deals with the analysis and control of dynamical systems, such as the evolution of a chemical reaction over time, the behavior of an electrical system or the trajectory of a plane, etc. To conduct this analysis or control, the first step consists in modeling the systems, i.e., in building a mathematical model describing the system behavior. In most cases, and as soon as physical principles at play are not perfectly known or too complex, this modeling step is performed from experimental observations of the system behavior.

System identification is the subfield of automatic control that focuses on the estimation of models from such data. In this framework, it is of particular importance to obtain guarantees on the model accuracy. System identification theory, which is mostly based on parametric statistics, provides asymptotic guarantees under rather restrictive assumptions (such as a precise specification of the noise, the model and the estimation method). These guarantees are often not suitable to precisely quantify the error of a model estimated from a finite number of data.

In the field of artificial intelligence, building predictive models from experimental data is studied in the framework of machine learning. In the “big data” era, this data science became ubiquitous in many computer science applications, but also in many other domains such as biology, medical imagining or robotics. Here also, guaranteeing the performance of the estimated models is of primary importance; and this is the focus of statistical learning theory. Contrary to classical parametric statistics, this theory provides guarantees in a much less restrictive and agnostic framework (with nonparametric models, without assumptions of the noise or the shape of the optimal model) and in particular nonasymptotic bounds on the prediction error of the models estimated from a finite number of data. However, most of the results of this type are established under an assumption on the independence of the observations, rather standard in many contexts, but not adapted to the case of dynamical system identification.

Sujet :
This project aims at bridging the gap between these two disciplines: extend learning theory to the nonindependent data in order to obtain the most accurate guarantees for the identification of dynamical systems. The project will in particular consider {em hybrid dynamical systems}. These hybrid systems mix continuous and discrete behaviors, and are call for the use of models switching between multiple operating modes. They are found in many applications, such as in communication networks, transport systems, industrial processes, engine control, biological systems, and so on.
For such systems, system identification faces an additional difficulty for the practical estimation of the models. Indeed, it is in this case rarely possible to guarantee the accuracy of the optimization algorithms used in practice to fit the model to the data and thus to precisely characterized the estimated model. Statistical learning theory provides an interesting solution to this issue by deriving uniform error bounds, i.e., bounds that hold for all possible models within a predefined class.

Results of this project are expected to have an impact in system identification, but also more downstream, at the level of robust control. Contributions related to the prediction of time-series are also expected.

Profil du candidat :
We are looking for a student motivated by interdisciplinary research at the crossroad of computer science, control theory and mathematics.

Formation et compétences requises :
Student holding a master degree in Computer science, Control or Applied mathematics, with good knowledge of probability and statistics.

Adresse d’emploi :
Nancy

Document attaché : sujetENG.pdf

Categories: theses

PhD proposition on Detection and impact analysis of issue and political ads at LIG (within the MIAI 3IA institute)

Sep 30 – Oct 1 all-day

Annonce en lien avec l’Action/le Réseau : aucun

Laboratoire/Entreprise : LIG
Durée : 3 ans
Contact : patrick.loiseau@univ-grenoble-alpes.fr
Date limite de publication : 2019-09-30

Contexte :
The 2016 United States presidential election was marked by an information war that took place into different social media platforms [9, 20]. Particularly, the election was marked by the abuse of targeted advertising on Facebook. For example, a group of Russian citizens and companies were indicted by U.S. authorities for trying to influence on the 2016 US election trough the Facebook Ads Platform [1, 6]. Since then social media platforms such as Facebook and Google released transparency platforms where they give access to interested parties to ads that are identified as ‘political’ by their platforms.

While this is an important step, in the dataset we collected using AdAnalyst (adanalyst.mpi-sws.org) we observed that
there are ads related to politics and important issues that are not labeled as such by Facebook. Hence, we believe it is important for independent parties to audit ads on Facebook as well.

Sujet :
The goal of the PhD thesis is to detect and study problematic ads in social media and assess the impact they have on users. The candidate is expected to contribute in two ways:

1. The first goal of the thesis is to propose algorithms to detect issue and political ads using machine learning tools such
as convolution neural networks and natural language processing tools. The student can thereafter focus on other kinds of problematic ads that promote for example bogus cures for diseases, anti-vaxxer blogs, or scammy financial services.

2. The second goal of the thesis is to design and perform experiments that can evaluate whether and to which extent people are influenced by these problematic ads that appear in their Facebook timeline. This is important as users have no control over what ads appear in their timeline, and users might be influenced by ads even if they do not click on them.

Profil du candidat :
Candidates should hold (or be about to get) a MSc degree in computer science and have:
• Strong coding skills.
• Experience in working with data.
• Strong motivation.
• Interest in the societal impact of data-driven systems.
• Interest in cognitive sciences and experimental research.

Formation et compétences requises :
Candidates should hold (or be about to get) a MSc degree in computer science and have:
• Strong coding skills.
• Experience in working with data.
• Strong motivation.
• Interest in the societal impact of data-driven systems.
• Interest in cognitive sciences and experimental research.

Adresse d’emploi :
Laboratoire d’Informatique de Grenoble
Batiment IMAG, 700 avenue Centrale, Domaine Universitaire
38400, Saint Martin d’Hères

Document attaché : 2019_PhD_ads_algorithms.pdf

Categories: theses

Oct

Tue

Complex-Valued Deep Neural Networks for RADAR Applications

Oct 1 – Oct 2 all-day

Annonce en lien avec l’Action/le Réseau : Doctorants

Laboratoire/Entreprise : CentraleSupélec SONDRA and ONERA
Durée : 3 years
Contact : jean-philippe.ovarlez@onera.fr
Date limite de publication : 2019-09-31

Contexte :
Radar signals are generally complex-valued (In-Phase and Quadrature channels with reduced Shannon sampling rate, polarimetric channels, interferometric channels, etc.). Also, radar processing schemes are generally based on complex filtering (FFT, Wavelets, Wiener, Matched Filter, etc.) and so impossible to be developed with classical Neural Network. Nowadays, Machine Learning Networks developed in the scientific community are mainly based on real nature of the signals (images, etc.). If the richness of information (mainly related to its physical meaning nature) contained in the phase has to be exploited, conventional Deep Neural Networks schemes have to be completely revisited.

Sujet :
We propose in this PhD topic to develop new architectures of Neural Network taking into account the the complex valued nature of radar signals. These new schemes will be based on the design of complex valued activation functions, complex thresholds, complex-valued optimization methodologies based mainly on complex gradient-descent-based problems. Finally, these new methodologies will be analyzed in terms of convergence of extended backpropagation algorithms (allowing the computation of complex neural weights). The improvement of such systems will be also analyzed in terms of performance compared to traditional Neural Networks.

Profil du candidat :
Master 2, High-level Engineer school

Formation et compétences requises :
Strong skills in Mathematics, Statistics, Statistical Signal Processing

Adresse d’emploi :
ONERA Palaiseau and CentraleSupelec,
Paris Saclay

Document attaché : Complex-Valued-DNN-for-Radar-Applications.pdf

Categories: theses

Oct

Thu

Thèse CIFRE: Traitement de données multi-sources hétérogènes dans un environnement d’objets connectés : application à la gestion interactive du réseau du transport de la société TICE

Oct 3 – Oct 4 all-day

Annonce en lien avec l’Action/le Réseau : aucun

Laboratoire/Entreprise : Laboratoire IBISC, Univ. Evry Université Paris-Saclay
Durée : 36 mois
Contact : Khalifa.Djemal@ibisc.univ-evry.fr
Date limite de publication : 10/03/2019

Contexte :
L’entreprise TICE gère en moyenne 80000 voyageurs par jour, ce qui engendre un volume de données très important. Face à cette masse de données dont l’évolution est exponentielle, la gestion des données nécessite des moyens technologiques et informatiques performants. Dans ce cadre, TICE souhaite optimiser et contrôler ces données pour mieux gérer ses différents services. Actuellement, l’exploitation se fait de manière ponctuelle et intuitive, ce qui ne permet pas une rentabilité contrôlée. En effet, l’exploration de nombreuses données, souvent hétérogènes, permet d’établir un modèle explicatif et de suivi du fonctionnement des différents services.
D’une manière générale, pour que cette entreprise puisse effectuer une analyse objective du bon fonctionnement de ses différents services et de leurs activités, elle doit d’abord rassembler les données provenant de toutes les sources considérées comme pertinentes, puis les transformer et les stocker dans une base de données très significative.
Dans ce cadre, les outils de l’informatique décisionnelle, en particulier les systèmes d’analyse en ligne (On-Line Analytical Processing : OLAP) [1], présentent de nombreux avantages. En effet, ces systèmes permettent une analyse sur de larges bases de données. D’une manière générale, les approches de l’informatique décisionnelle [1, 2, 3] et de l’apprentissage statistique permettent de développer des outils d’analyse, d’interprétation et de la décision [4, 5, 6, 7].

Références:
[1] Lucile Sautot, Conception et implémentation semi-automatique des entrepôts de données : application aux données Ecologiques. Thèse de doctorat de l’Université de Bourgogne, 09/06/2015.
[2] Michaël Briot, Étude sur la mise en œuvre d’un entrepôt de données et conception d’un prototype en vue d’une intégration au sein de France Billet, Mémoire, CNAM, 2013.

[3] Alexis Lechervy, Apprentissage interactif et multi-classes pour la détection de concepts sémantiques dans des données multimédia, Thèse de doctorat de l’Université de Cergy-Pontoise, 6 décembre 2012.

[4] Rostom Kachouri, Khalifa Djemal, Hichem Maaref, Multiple kernel weighting based SVM for heterogeneous image recognition system, International Journal of Signal and Imaging Systems Engineering (IJSISE), Vol. 4, n° 2, pages: 60-70, inderscience, DOI: 10.1504/IJSISE.2011.041600, 2011.

[5] Rostom Kachouri, Khalifa Djemal and Hichem Maaref, Multi-model classification method in heterogeneous image databases, Pattern Recognition (Elsevier), Vol. 43, n° 12, Pages 4077-4088, December 2010.

[6] Khalifa Djemal and Hichem Maaref, Intelligent Information Description and Recognition in Biomedical Image Databases, In:Computational Modeling and Simulation of Intellect: Current State and Future Perspectives, Book Edited by Boris Igelnik, pages: 52-80, Publisher IGI Global, ISBN: 978-1-60960-551-3, February 2011.

[7] Khalifa Djemal, Hichem Maaref and Rostom Kachouri, Image Retrieval System in Heterogeneous Database, In: Automation Control – Theory and Practice, Book Edited by:A.. D. Rodic, pages: 327-350, publisher INTECH, ISBN: 978-953-307-039-1, December 2009.

Sujet :
En concertation avec les différents services de l’entreprise TICE, le travail commencera par la définition d’un plan de récolte de données avec une stratégie de stockage adaptée. Ensuite, autour d’un système de gestion de base de données permettant le stockage de l’entrepôt de données, la conception d’un outil d’intégration de données. Ces données souvent hétérogènes et provenant de multiples sources connectées seront intégrées après conditionnement (normalisation) dans l’entrepôt de données. Ces outils, permettent ensuite la restitution de données, et de fournir des statistiques descriptives sur les données.
Les recherches dans cette thèse porteront principalement sur la conception et la réalisation d’un système d’interprétation et de restitution des données provenant de sources hétérogènes permettant une gestion optimale et interactive des informations issues d’objets connectés du réseau du transport de la société TICE. Pour ces objectifs, un modèle d’apprentissage statistique sera développé et mis en place pour l’interprétation et la restitution interactive des données. Les différentes fonctionnalités réalisées seront évaluées dans le cadre d’une architecture OLAP avec un système d’information décisionnel qui pourra apporter des solutions technologiques pertinentes et innovantes à l’entreprise TICE. Des modalités de visualisation de données et d’interaction seront également étudiées afin d’offrir une interface homme machine multimodale et multiplateformes.

Modalités pratiques :
Cette thèse sera effectuée alternativement au sein du laboratoire IBISC (équipe IRA2) à Evry, France et dans l’entreprise TICE à Evry.

Le candidat retenu sera inscrit en thèse à l’École Doctorale Sciences et technologies de l’information et de la communication (STIC) de l’Université Paris-Saclay.
La thèse sera placée sous la direction de Khalifa DJEMAL (UEVE), codirigée par Samir Otmane (UEVE) et co-encadrée par Karine Hallouin de la société TICE.)

Candidature sur la plateforme ADUM : https://www.adum.fr/index.pl
Contact : Khalifa DJEMAL : khalifa.djemal@univ-evry.fr

Profil du candidat :
Diplômé(e) de Master Recherche (ou équivalent) en Informatique, Recherche opérationnelle, Apprentissage et intelligence artificielle.
– Compétences en développement logiciel, base données et bons bagages scientifiques.

– Intérêt pour la conception et le prototypage rapide, les tests, et l’évaluation avec les utilisateurs finaux.
– Bonne maitrise de la communication en français et en anglais (oral/écrit).
– Qualités recherchées : grande motivation, autonomie, rigueur, force de proposition, ouverture aux approches pluridisciplinaires

Formation et compétences requises :
Master Recherche (ou équivalent) en Informatique, Recherche opérationnelle, Apprentissage et intelligence artificielle. Compétences en développement logiciel, base données et bons bagages scientifiques.

Adresse d’emploi :
Contact : Khalifa DJEMAL : khalifa.djemal@univ-evry.fr

Laboratoire IBISC,
Université d’Evry Val d’Essonne
40 rue du Pelvoux
91020 Evry cedex

Document attaché : Thèse-Cifre-DJEMAL-OTMANE-TICE-2018.pdf

Categories: theses

Nov

Fri

Diffusion d’information dans les réseaux sociaux

Nov 29 – Nov 30 all-day

Annonce en lien avec l’Action/le Réseau : aucun

Laboratoire/Entreprise : Laboratoire Informatique de Bourgogne EA 7534
Durée : 36 mois
Contact : hocine.cherifi@u-bourgogne.fr
Date limite de publication : 20191129

Contexte :
La compréhension des mécanismes de circulation/diffusion/propagation des discours, des opinions, des fake-news, des rumeurs dans les réseaux sociaux numériques devient un enjeu de société. La viralité et la détection des robots ont été étudiées lors d’événements majeurs tels que les élections présidentielles américaines en 2016 [Kollanyi 2016], le brexit [Howard 2016] ou encore pendant l’élection présidentielle française de 2017 [Ferrara 2017]. Les algorithmes développés utilisent des techniques de machine learning en agrégeant plusieurs centaines de critères, ils permettent une détection assez fiable sur des événements précis mais il ne sont pas généralisables, et nécessitent une phase d’apprentissage coûteuse et n’apportent un éclairage que sur certains aspects de l’étude.

La compréhension de ces mécanismes soulève plusieurs questions :
• Quel est l’impact de la topologie du réseau dans les phénomènes de diffusion d’un message viral (et son importance par rapport au contenu des messages) ?
• Quels types de relations dans un réseau sont les plus à même d’amplifier la diffusion ?
• Comment les communautés et surtout les communautés polarisées participent à la diffusion ?
• Quel est le rôle des robots, leur comportement ?
• Comment les personnes influentes ou les leaders d’opinion agissent dans cette diffusion ?
• Est-il possible de généraliser les mécanismes observés c’est-à-dire quels patterns, structures ou processus types participent ou conditionnent la diffusion à grande échelle ?

Sujet :
Pour répondre à ces questions plusieurs algorithmes ont été développés et validés sur des jeux de données spécifiques pour détecter des communautés [Drif 2014, Orman 2012], des utilisateurs influents [Riquelme 2016, Ibnoulouafi 2018], des robots [Ferrara 2016], des événements [Atefeh 2015], des messages viraux, rumeurs [Sela 2017, Zubiaga 2018, Hoang 2011], etc.

Les problématiques abordées dans cette thèse sont les suivantes :

• la première est relative à la modélisation des données et a pour but de développer un modèle de données [Leclercq 2018] à partir de la notion de réseau complexe multi-couches [DeDomenico 2013, Kivela 2014] qui permettra de représenter et d’exploiter les différentes relations en leur donnant une sémantique appropriée. Par exemple Twitter, par la richesse des relations issues des opérateurs (follow, retweet, mention, etc.) génère des réseaux complexes dont la sémantique est cachée. Il constitue donc un bon terrain d’expérimentation.

• la seconde concerne la combinaison d’algorithmes afin de répondre aux interrogations des chercheurs en sciences sociales. Cet enjeu est majeur car il permet de comprendre des processus complexes, d’élaborer des modèles et de les tester sur les données. A titre d’exemple l’étude de l’influence ne peut pas être dissociée de la notion de communauté [Weng 2013, Kumar 2018, Gupta 2016, Gupta 2015] et des frontières de communautés, il est est de même pour la propagation des messages viraux qui peuvent se diffuser à l’intérieur d’une communauté (déjà acquise), ou se propager à l’extérieur, voire mếme à modifier la structure communautaire. Une des pistes prometteuses est d’utiliser les techniques de graph embeding [Hongyun 2017, Goyal 2017]

D’un point de vue expérimental, le sujet abordera dans un premier temps la mise en place de différents algorithmes d’analyse, en s’appuyant sur des jeux de données déjà collectés par l’équipe1 pour :
1°) mesurer l’audience/l’impact d’une thématique ou d’un événement [Atefeh 2015]. Un point à aborder est la viralité du discours et l’impact des robots dans la propagation du discours [Ferrara 2016] ;
2°) montrer l’existence de communautés dans lesquels le discours circule. Ces communautés doivent être caractérisées [Basaille 2018, Jebabli 2014, Jebabli 2015] pour mettre en avant leur particularité/singularité. Mais les communautés s’imbriquent, s’intersectent et il faut aussi étudier leur influence réciproque sur la circulation du discours ;
3°) étudier l’influence d’une communauté vers une autre et les personnes qui font les liens (élasticité des frontières) et par conséquent revoir la notion d’influenceurs [Azaza 2016, Jebabli 2015a], de leader d’opinion en fonction de la circulation des informations.
Dans un second temps, à partir des résultats expérimentaux, il conviendra de développer un ou plusieurs modèles de diffusion et de les tester sur de nouveaux jeux de données en exploitant leur aspect prédictif pour valider leur aspect explicatif [Jebabli 2018].

D’un point de vue plus théorique, une des problématiques abordées traite de l’adaptation et de la combinaison d’algorithmes traditionnels aux réseaux complexes multi-couches et l’extension des techniques de graph embeding, les preuves des propriétés des algorithmes proposés devront être abordées comme par exemple la convergence, les mesures qualités par rapport à une vérité de terrain, etc.

Cette thèse est centrée sur les aspects fondamentaux de modèle de données et d’outils d’analyse pour réseaux complexes mais s’appuie sur des collaborations institutionnelles établies depuis 2013 avec les laboratoires de l’Université de Bourgogne TIL et CIMEOS en Sciences Humaines et Sociales au travers de projets interdisciplinaires TEE 2014, TEP 2017, PEPS CNRS MOMIS.

Bibliographie (en gras publications des membres de l’équipe) :
 [Atefeh 2015] Atefeh, Farzindar, and Wael Khreich. “A survey of techniques for event detection in twitter.” Computational Intelligence, 31.1 (2015): 132-164.
[Azaza 2016] Azaza, Lobna, Kirgizov, Sergey, Savonnet, Marinette, et al. Information fusion-based approach for studying influence on twitter using belief theory. Computational Social Networks, 2016, vol. 3, no 1, p. 5-25.
[Basaille 2018] Basaille Ian, Plateforme pour la gestion des données issues des réseaux sociaux dans le cadre de la gestion de la relation client, Thèse de doctorat, Université de Bourgogne, 2018.
[Jebabli 2018] Jebabli M., Cherifi H., Cherifi C., Hammouda A., “Community detection algorithm evaluation with ground-truth data, Physica A: Statistical Mechanics and its Applications 492, 651-706 Elsevier 2018
[Jebabli 2015] Jebabli M., Cherifi H., Cherifi C., Hammouda A., “Overlapping community detection versus ground-truth in AMAZON co-purchasing network” in 11th International Conference on Signal Image Technology & Internet-Based Systems, Proceedings of IEEE 2015
[Jebabli 2015a] Jebabli M., Cherifi H., Cherifi C., Hammouda A., “User and group networks on YouTube: A comparative analysis, in 12th International Conference on Computer Systems and Applications (AICCSA), Proceedings of IEEE 2015
[Jebabli 2014] Jebabli M., Cherifi H., Hammouda A.,”Overlapping Community Structure in Co-authorship Networks: A Case Study,” in 7th International Conference on u- and e- Service, Science and Technology (UNESST), Proceedings of IEEE pp.26,29, 2014
[Leclercq 2018] Leclercq, Eric, Savonnet, Marinette, “Modèle tensoriel pour l’entreposage et l’analyse des réseaux sociaux – Application à l’étude de la viralité sur Twitter”, INFORSID 2018, à paraître.
[DeDomenico 2013] De Domenico, Manlio, et al. “Mathematical formulation of multilayer networks.” Physical Review X 3.4 (2013): 041022.
[Drif 2014] Drif, Ahlem, and Abdallah Boukerram. “Taxonomy and survey of community discovery methods in complex networks.” International Journal of Computer Science and Engineering Survey 5.4 (2014): 1.
[Ferrara 2016] Ferrara, Emilio, et al. “The rise of social bots.” Communications of the ACM 59.7 (2016): 96-104.
[Ferrara 2017] Ferrara, Emilio. “Disinformation and social bot operations in the run up to the 2017 French presidential election.” (2017).
[Goyal 2017] P Goyal, E Ferrara, “Graph embedding techniques, applications, and performance: A survey”, arXiv preprint arXiv:1705.02801, (2017).
[Gupta 2016] Gupta N., Singh A., Cherifi H. , “ Centrality measures for networks with community structure”, Physica A: Statistical Mechanics and its Applications 452, 46-59, Elsevier 2016
[Gupta 2015] Gupta N., Singh A., Cherifi H., “Community-based Immunization Strategies for Epidemic Control ” in 7th International Conference on Communication Systems and Networks, Proceedings of IEEE , 2015
[Hoang 2011] Hoang, Tuan-Anh, et al. “On modeling virality of twitter content.” International Conference on Asian Digital Libraries. Springer, Berlin, Heidelberg, 2011.
[Hongyun 2017] Cai, Hongyun, Vincent W. Zheng, and Kevin Chen-Chuan Chang. “A Comprehensive Survey of Graph Embedding: Problems, Techniques and Applications.” arXiv preprint arXiv:1709.07604 (2017).
[Howard 2016] Howard, Philip N., and Bence Kollanyi. “Bots,# strongerin, and# brexit: Computational propaganda during the uk-eu referendum.” Browser Download This Paper (2016).
[Ibnoulouafi 2018] Ibnoulouafi A. El Hassouni M., Cherifi, “M-Centrality: Identifying key nodes based on global position and local degree variation” In revision for Journal of Statistical Mechanics: Theory and Experiment
[Kivela 2014] Kivelä, Mikko, et al. “Multilayer networks.” Journal of complex networks 2.3 (2014): 203-271.
[Kollanyi 2016] Kollanyi, Bence, Philip N. Howard, and Samuel C. Woolley. “Bots and automation over Twitter during the first US Presidential debate.” Comprop data memo 1 (2016): 1-4.
[Kumar 2018] Kumar M. Singh A., Cherifi H., “ An efficient Immunization Strategy using Overlapping Nodes and its neighborhoods” to appear in the proceedings of the Web Conference 2018 (WWW18), Lyon, France
[Orman 2012] Orman G.K, Labatut V., and Cherifi H., “Comparative evaluation of community detection algorithms: a topological approach”, Journal of Statistical Mechanics: Theory and Experiment, P08001, august 2012.
[Riquelme 2016] Riquelme, Fabián, and Pablo González-Cantergiani. “Measuring user influence on Twitter: A survey.” Information Processing & Management 52.5 (2016): 949-975.
[Sela 2017] Sela, Alon, et al. “Increasing the Flow of Rumors in Social Networks by Spreading Groups.” arXiv preprint arXiv:1704.02095 (2017).
[Varol 2018] Varol, Onur. Analyzing Social Big Data to Study Online Discourse and Its Manipulation. Diss. Indiana University, 2017.
[Weng 2013] Weng, Lilian, Filippo Menczer, and Yong-Yeol Ahn. “Virality prediction and community structure in social networks.” Scientific reports 3 (2013): 2522.
[Zubiaga 2018] Zubiaga, Arkaitz, et al. “Detection and Resolution of Rumours in Social Media: A Survey.” ACM Computing Surveys (CSUR) 51.2 (2018): 32.

Profil du candidat :
Le/la candidat(e) doit donc pouvoir mener à bien une recherche innovante et de grande qualité. Il/elle devra développer des recherches visant à: (i) traiter les problèmes théoriques fondamentaux et (ii) concevoir des modèles et des algorithmes pouvant être utilisés pour comprendre les processus de diffusion dans les réseaux multicouches.

Il/elle doit avoir de bonnes connaissances en mathématiques appliquées, algèbre linéaire, statistiques et informatique (algorithmique).

Les autres exigences sont:

Bonne maîtrise des langages de programmation tels que R, Python pour l’analyse de données et C ++ ou équivalent pour les simulations informatiques, l’accès à la base de données et le stockage.
Expérience – ou intérêt à développer – des techniques efficaces d’analyse de données à grande échelle.
Très bonne maîtrise de l’anglais (oral et écrit) et excellentes compétences en communication.
La curiosité, l’autonomie, l’intégrité et la créativité sont des qualités souhaitées

Formation et compétences requises :
Le/la candidat(e) doit être titulaire d’une maîtrise (ou équivalent) dans le domaine de l’informatique, des mathématiques appliquées ou d’une discipline connexe, obtenue avec une très bonne note finale (avec une note moyenne de B ou supérieure).

Adresse d’emploi :

L’équipe Science des Données du Laboratoire d’Informatique de Bourgogne est fortement axée sur la recherche quantitative, mais appliquée avec des compétences reconnues en Système d’information, IA, Big Data et réseaux Complexes.
Elle offre :
– La possibilité d’achever un doctorat dans le domaine de la science des réseaux et du Big Data en faisant appel à des outils de systèmes complexes et à la modélisation des interactions socio-économiques.
– Un travail large et indépendant au sein d’une équipe dynamique dans une atmosphère de travail positive.
– Un programme de développement de carrière complet (participation à des écoles d’été, conférences, etc.).

Directeur de thèse : Hocine Cherifi
Co-encadrants : Eric Leclercq et Marinette Savonnet

Comment postuler

Les candidatures doivent être envoyées par courrier électronique, avec les éléments suivants:
• un CV ,
• les relevés des notes universitaires
• une déclaration d’intérêts (une page, maximum)
• les noms et coordonnées de deux référents.

Les demandes de renseignements supplémentaires peuvent être envoyées aux contacts (adresse ci-dessous).
Adressez votre correspondance avec comme sujet : Thèse diffusion d’information dans les réseaux sociaux

Contacts :
Hocine Cherifi – Hocine.Cherifi@u-bourgogne.fr
Marinette Savonnet – Marinette.Savonnet@u-bourgogne.fr
Eric Leclercq – Eric.Leclercq@u-bourgogne.fr

Laboratoire d’Informatique de Bourgogne (LIB) – EA 7534
Équipe Science de Données
Université de Bourgogne
9, Avenue Alain Savary
21078 Dijon

Document attaché : SujetDIR.pdf

Categories: theses

Dec

Sun

High-Dimensional Machine Learning with Multiple Measurement Data Vectors

Dec 1 – Dec 2 all-day

Annonce en lien avec l’Action/le Réseau : aucun

Laboratoire/Entreprise : Univeristé de Lille – Laboratoire CRIStAL
Durée : 36 mois
Contact : remy.boyer@univ-lille.fr
Date limite de publication : 2019-12-01

Contexte :
Nom et prénom du directeur de thèse : Boyer Rémy
Web (old) :http://www.l2s.centralesupelec.fr/perso/remy.boyer
Web (new): https://www.cristal.univ-lille.fr/profil/remyboyer?lang=en

Nom et prénom du co-encadrant de thèse : Boulanger Jérémie

Nom du laboratoire d’accueil du directeur: CRIStAL – UMR 9189

Date de l’obtention de l’habilitation à diriger des recherches du directeur: 11/2012

Adresse du directeur: Cité Scientifique, bâtiment P2, 59655 Villeneuve d’Ascq Cedex

Téléphone du directeur : 0320434567

E-mail du directeur: remy.boyer@univ-lille.fr

E-mail du co-encadrant: jeremie.boulanger@univ-lille.fr

Sujet :
Un nombre sans cesse croissant de données de grande dimensionnalité sont générées quotidiennement dans de nombreuses applications. Cela mène à une forte demande pour des algorithmes capables d’extraire de l’information utile à partir de cette masse de donnée. L’apprentissage automatique s’intéresse au développement de tels algorithmes qui soient capables d’apprendre à partir de ces données. Les applications classiques de ce genre de techniques vont du traitement automatique de catégorisation de texte, de la classification entre différent types de données, de prédictions météorologiques, de recommandations de contenu sur différentes plate-formes de VOD ou de musique en ligne au filtrage de courriers électroniques indésirables… Les réseaux neuronaux constituent à cet égard un puissant outil pour l’apprentissage automatique [1,2] capables de traiter des cas de données non linéaires. Dans le cas de mesures multiples, les données observées sont généralement multidimensionnelles et peuvent être vues comme des vecteurs de mesures multiples (MMV). Le but est alors de classifier les données dans une des catégories. Dans ce cas, on définit le tenseur généralisant la sortie d’une couche neuronale et la fonction de score associée. Il est à noter que des fonctions de coût similaires ont déjà été considérées dans les articles [6,7].

La factorisation de tenseurs [9] et l’utilisation de l’apprentissage profond se sont rapidement développées dans de nombreux domaines scientifiques tels que la psychologie, la chimie, les neurosciences, le traitement du signal, le traitement des images, la bio-informatique ou encore la fouille de données [8]. De nombreuses modalités et paramètres sont généralement présents dans tous les cas pratiques de mesure de données, tels que les conditions d’acquisitions, les canaux enregistrés, l’échantillonnage temporel et spatial, la température,…

De puissants outils mathématiques de l’algèbre des tenseurs peuvent être utilisés pour extraire des caractéristiques pertinentes à partir de transformations linéaires. Malheureusement, la complexité en terme de stockage et la complexité calculatoire sont exponentielles avec la dimension ou le nombre de paramètres. Le travail proposé se décompose en deux parties :

– L’exploration de méthodes avancées de factorisation tensorielles [10] pour limiter les problèmes liés à la dimensionnalité.

– Proposer de nouveaux algorithmes pour la rétro-propagation de gradient adaptés à la topologie de la décomposition de tenseur sur les graphes. Étant donné que cette partie de l’apprentissage automatique possède un coût calculatoire important, l’utilisation de la décomposition des tenseurs sur graphe devrait donner lieu à des méthodes avec des complexités bien moindre que l’implémentation directe.

1] Y. LeCun, B. E Boser, J. S Denker, D. Henderson, R. E Howard, W. E Hubbard, and L. D Jackel. Handwritten digit recognition with a back-propagation network. In Advances in neural information processing systems, pp. 396-404, 1990.

[2] Y. LeCun, Y. Bengio, et al. Convolutional networks for images, speech, and time series. The hand-book of brain theory and neural networks, 3361(10) :1995

[3] D.L. Donoho (2000). High-dimensional data analysis : The curses and blessings of dimensionality. AMS Math Challenges Lecture, 1, 32 pages.

[4] R. Boyer, R. Badeau and G. Favier, Fast Orthogonal Decomposition Of Volterra Cubic Kernels Using Oblique Unfolding, IEEE, Proc. of International Conference on Acoustics, Speech, and Signal, Processing, (ICASSP’11)

[5] JH. Goulart, M. Boizard, R., Boyer, G., Favier, and P. Comon, Tensor CP Decomposition with Structured Factor Matrices : Algorithms and Performance, IEEE Journal of Selected Topics in Signal Processing, Volume 10, No. 4, June, 2016, pp. 757-769.

[6] N. Cohen, O. Sharir, and A. Shashua. On the expressive power of deep learning : A tensor analysis. In Conference on Learning Theory, pp. 698-728, 2016.

[7] E. Stoudenmire and D. J Schwab. Supervised learning with tensor networks. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett (eds.), Advances in Neural Information Processing Systems 29, pp. 4799-4807. Curran Associates, Inc., 2016.

[8] N. D. Sidiropoulos, L. De Lathauwer, X. Fu, K. Huang, E.E. Papalexakis, and C. Faloutsos, (2017) Tensor decomposition for signal processing and machine learning. IEEE Transactions on Signal Processing, 65(13), 3551-3582.

[9] A. Cichocki et al., Tensor decompositions for signal processing applications : From two-way to multiway component analysis, IEEE Signal Process. Mag., vol. 32, no. 2, pp. 145-163, Mar. 2015.

[10] I. V. Oseledets, Tensor-Train decomposition, SIAM J. Scientific Computing, vol. 33, no. 5, pp. 2295-2317, 2011.

Profil du candidat :
Des compétences en statistiques, algèbre et apprentissage automatique sont souhaitées. Le candidat devra avoir un intérêt pour la recherche méthodologique et académique.

Formation et compétences requises :
Master de recherche ou spécialisation d’école d’ingénieurs dans le domaine des sciences des données ou/et en statistiques en grande dimension.

Adresse d’emploi :
CRIStAL – UMR 9189
Cité Scientifique
59655 Villeneuve d’Ascq Cedex

Document attaché : Tensorized-NN_PhD_Subject.pdf

Categories: theses

Réduction du volume des données d’usages clients automobiles

Dec 1 – Dec 2 all-day

Annonce en lien avec l’Action/le Réseau : Formation

Laboratoire/Entreprise : I2M/RENAULT
Durée : 3 ans
Contact : badihghattas@gmail.com
Date limite de publication : 2019-11-31

Contexte :
Thèse industrielle CIFRE, RENAULT – Institut de mathématiques de Marseille.

Sujet :
Réduction du volume des données d’usages clients automobiles

Voir document joint

Profil du candidat :
Ecole d’ingénieur généraliste ou spécialisée en statistiques, ENSTA, INSA, Centrale, ENSAI, et masters en statistiques ou data science

Formation et compétences requises :
Statistiques, Machine Learning, Big Data, R/Python et éventuellement Culture automobile.

Adresse d’emploi :
RENAULT, Technocentre Guyancourt.

Document attaché : Sujet-thèse-CIFRE-Usages-Clients-2019-07-02.pdf

Categories: theses

Jan

Wed

Deep learning for the semantic segmentation of SAR ocean images

Jan 1 – Jan 2 all-day

Annonce en lien avec l’Action/le Réseau : aucun

Laboratoire/Entreprise : Lab-STICC/CLS
Durée : 36 mois
Contact : ronan.fablet@imt-atlantique.fr
Date limite de publication : 2020-01-01

Contexte :
There is an overwhelming amount of data from Copernicus Sentinel-1 satellites. For instance, thousands
of images are produced every day representing a daily average of 3,45 TB SAR data published. A significant
amount covers ocean surface, used for a wide range of applications involving public and private
stakeholders. Notwithstanding the current great impact of Sentinel-1 images for the oceanographic
community, we believe the imaging capabilities of these C-band SAR data acquired over ocean surface are
not fully exploited.
Beyond wind field measurements, sea ice monitoring or oil spill detection, a set of metocean features are
well observed by S-1 sensors. To name a few, atmospheric fronts, oceanic fronts, rain cells, micro
convective cells, internal waves, gravity waves, biologic slicks, upwelling or wind streaks are phenomena
of potential interest for many end-users while being generally discarded in the SAR images. For academic
science and industry, new research perspectives and new potential applications/services could be
triggered with the proposed advanced monitoring of these metocean mechanisms. In addition, flagging
of these features could also benefit to current products/services providing better data quality. With recent
progress in computer vision thanks to Deep Learning approaches in conjunction with the rise of large
database and higher computing power, these flagging activities are now possible.

Sujet :
See detailed description in attached document or at:
http://sites.ieee.org/france-grss/files/2019/07/Thesis_CLS-IMT_DL-SAR.pdf

Profil du candidat :
Msc. or engineer degree in computer science, data science, applied maths, signal processing and/or remote sensing

Formation et compétences requises :
Skills:
o Master of Science (or equivalent) in Applied Mathematics, Computer Science or Machine (Deep) Learning
o Good programming skills (Python) with proven experience (i.e. github project)
o Ideally experience with cloud computing (Dockers, Kubernetes…)

• Know-how: fluency in written communication (writing technical notes and scientific articles – in
English in particular) and oral communication (presentation at contractual meetings or scientific
conferences), work organization, scientific rigor.

• Soft skills: Dynamism, enthusiasm, good interpersonal skills, autonomy, capacity for innovation,
taste for teamwork.

Adresse d’emploi :
lab-STICC, IMT Atlantique, Brest
CLS, Brest

Document attaché : Thesis_CLS-IMT_DL-SAR.pdf

Categories: theses

Mar

Sun

Smart-healthcare System with Federated Learning

Mar 15 – Mar 16 all-day

Annonce en lien avec l’Action/le Réseau : MACLEANDoctorants

Laboratoire/Entreprise : GEMTEX, ENSAIT, France et University of Kent, Kent, UK
Durée : 36 mois
Contact : quoc-thong.nguyen@ensait.fr
Date limite de publication : 2020-03-15

Contexte :
World report on aging and health from the World Health Organization (WHO) in 2015 shows that the problem of global population aging is becoming more serious. The proportion of the population aged over 60 years old will increase from 12 % in 2015 to 22 % in 2050. With a twice growing speed, the number of elderly people aged 60 and over will reach 2 billion during the next 35 years. Increasing demand and costs for healthcare is a challenge because of the high populations and the difficulty to cover all patients by the available doctors. In this case, one possible solution is the incorporation of both wearable computing and the Internet of Things (IoT) technology into health. After an operation, patients usually go through the rehabilitation process where they follow a strict routine. All the physiological signals, as well as behaviors of the patient, are possible to be monitored with the help of smart garments. The system can be tuned to the requirement of the individual patient. The patient’s health status and behavior can be observed remotely by doctors.

The candidate will be working (50%) in Human-Centered Design (HCD) team, GEMTEX research laboratory, ENSAIT, France.
The candidate will also work at School of Computing; University of Kent, UK (50%).
PhD Diplomas are issued by both Ecole Central de Lille and the University of Kent.

Salary net: ~1450EUR, plus 600-800EUR during the period in the UK.

Sujet :
The aim of this project is to propose an AI and cloud-enabled smart healthcare system. In order to collect the patient’s health status data, we can use “intelligent garment” instead of wearable sensors. In the intelligent garment, body sensors are integrated with the textile garment, which shall take various factors into consideration, such as sensor type, strategic location for sensor placement, the layout of flexible electricity cable, weak signal acquisition equipment, low-power wireless communications, and user comfortableness. The pulse sensor, body temperature sensor, electrocardiography (ECG) sensor, myocardial sensor, blood oxygen sensor, electroencephalographic (EEG) sensor and batteries are all connected with flexible wires. In order to facilitate the washing of smart clothing, the non-waterproof components can be all removed by taking off the buttons of clothing. Users can remove these components before washing and then reinstall them to the garment by snap on the buttons back. We propose to develop a smart healthcare platform, which is composed of three key components: 1) federated learning models that are trained using data stored at multiple different homes of the patients without the data ever shared with a hospital or a tech company’s servers, 2) one or several computing devices that serve as the “edge” servers locally, and 3) the intelligent garment that can communicate with the edge device(s). The PhD student will extend the state-of-the-art in the area of Federated Learning, Deep Learning applications for Smart Healthcare System. An important part of his/her work will be devoted to publishing and presenting in peer-reviewed journals and at relevant conferences.

Profil du candidat :
1. Applicants should hold a master’s degree or equivalent in Computer Science, Automation or a closely related area.

2. a solid background in Computer Science/engineering.

3. ability to work on interdisciplinary research projects.

4. completed language proficiency (equivalent to IELTS 6.5 or higher) requirements.

Formation et compétences requises :
1. good at machine learning and statistics.
2. programming skills in Python.
3. at least one international publication.

The interested candidates please send your CV, motivation letter via e-mail under reference SHSFL to quoc-thong.nguyen@ensait.fr by 15/03/2020

Adresse d’emploi :
ENSAIT (École nationale supérieure des arts et industries textiles), Roubaix, France

University of Kent, UK

Document attaché : Smart-Healthcare-System-with-Federated-Learning.pdf

Categories: theses

Mar

Tue

ACDC with deep learning : Automatic Crater Detection and Characterization with deep learning

Mar 31 – Apr 1 all-day

Offre en lien avec l’Action/le Réseau : – — –/Doctorants

Laboratoire/Entreprise : GEOPS, Central SUPEL
Durée : 3 years
Contact : frederic.schmidt@u-psud.fr
Date limite de publication : 2020-03-31

Contexte :
URGENT : DEADLINE THE 30 MARCH 2020

This study takes place in the data deluge from the numerous space missions across the Solar System. The project proposes to develop a tool to automatically detect and characterize the most ubiquitous feature on planetary body : craters. The PhD is cofunded by the French Space Agency (CNES).

Sujet :
The aim is to developed a tool to define precise size and position of all craters in the scene, whatever the illumination conditions, the type of sensor and the scale. As a second goal, the project will have to determine the crater characteristics, such primary / secondary (ejecta from a previous impact, not from a direct impactor), presence / absence of rays, erosion level…
This study will take advantage of the machine learning and deep learning libraries available as open source to propose the most versatile and robust detection method. We propose to develop a new tool dedicated to this task. In addition, we propose to organize a worldwide challenge for any researcher/students as an open source strategy, in a framework called RAMP. This platform is designed for collaborative work and gives access to the source code of the participants (not only the results).
Such software pipeline is required to tackle fundamental questions in planetary science to study the surface processes across the Solar System. It will be a crucial tool to precisely date the surface and open a new era for onboard decisions on landing or targeting, to maximize the science return of future deep space missions.

Profil du candidat :
Engineer or M2 in one or more domain : Signal Treatment, Data Science, Remote Sensing, Planetary Science, Astrophysics

Formation et compétences requises :
The candidate must have a engineer or master grade in machine learning/data mining or in planetary science/astrophysics. Double competence in both fields will be encouraged. An excellent level of programming skills is required (Python, linux). We expect the candidate to have a good level of communication in English (written and oral).

Adresse d’emploi :
UMR8148 GEOPS
Bât 509, Université Paris Saclay
91405 ORSAY, FRANCE

Document attaché : 202003201406_ACDC.pdf

Categories: theses

APPORT DE L’APPRENTISSAGE AUTOMATIQUE POUR L’INTÉGRATION D’OBSERVATIONS SATELLITAIRES DANS UN MODÈLE MONDIAL DU SYSTÈME SOL-PLANTE

Mar 31 – Apr 1 all-day

Annonce en lien avec l’Action/le Réseau : aucun

Laboratoire/Entreprise : CNRM, Meteo France
Durée : 36 months
Contact : nemesio.rodriguez@cesbio.cnes.fr
Date limite de publication : 2020-03-31

Contexte :
Thèse financé par le CNES et Méteo France.

Télédétection et modélisation des surfaces terrestres.

Assimilation de données dans le contexte de LDAS-Monde développé par le CNRM assimile des produits de niveau 2 d’humidité superficielle du sol et d’indice de surface foliaire de la végétation. La plupart des produits de niveau 1 (par exemple températures de brillance des radiomètres micro-ondes, coefficient de rétrodiffusion radar) contiennent à la fois de l’information sur l’humidité du sol et sur la végétation. Certaines variables telles que l’albédo de surface présentent une forte variabilité temporelle en réponse à l’éclairement solaire et aux conditions de surface (structure du couvert végétal, humidité du sol, etc.) qu’il est difficile à représenter avec des modèles physiques.

Sujet :
Dans un contexte de changement climatique et d’augmentation probable dans le futur de la fréquence et de l’intensité des événements
extrêmes, des sécheresses agricoles en particulier, il est nécessaire de mieux représenter la réponse de la végétation au climat. Le suivi
de l’impact des événements extrêmes sur les surfaces terrestres fait intervenir de nombreuses variables du système sol-plante, comme le
contenu en eau des sols et l’indice de surface foliaire (LAI) de la végétation. Ces variables peuvent être suivies de deux façons : (1) en
utilisant le volume d’observations sans précédent fourni par la flotte de satellites d’observation de la Terre, (2) en utilisant des modèles des
surfaces terrestres. Il existe une troisième solution qui consiste à combiner l’ensemble de l’information disponible en intégrant les
observations satellitaires dans les modèles. Ce processus s’appelle l’assimilation de données. Elle produit une analyse des variables
terrestres qui constitue la meilleure estimation possible car les informations de départ sont pondérées de façon à prendre en compte les
incertitudes. Elle n’est possible que pour les observations pouvant être simulées par le modèle. Les produits satellitaires de niveau 2 ou 3
sont des variables biophysiques pouvant être simulées par les modèles. Ces produits sont élaborés à partir des produits de niveau 1 que
sont les températures de brillance, les radiances, les réflectances, ou les coefficients de rétrodiffusion radar par exemple. Alors que les
produits de niveau 1 sont proches de l’observation physique faite par les capteurs embarqués, les produits de niveau supérieur sont le
résultat d’une interprétation de l’observation de niveau 1. Ce processus engendre une cascade d’incertitudes qu’il est difficile de quantifier
dans l’assimilation de données. Il est donc préférable d’assimiler des produits de niveau 1. L’objectif de la thèse est de développer
l’assimilation de produits de niveau 1 dans le modèle ISBA de la plateforme de modélisation SURFEX. ISBA est utilisé dans le système
d’assimilation de données LDAS-Monde (Albergel et al. 2017). Ces opérateurs d’observation seront fondés sur l’apprentissage
automatique (par exemple Rodríguez-Fernández et al., 2019) et concerneront les produits de niveau 1 des satellites SMOS, ASCAT,
Sentinel-1, SPOT-VGT, PROBA-V et Sentinel-3.

– Albergel et al., https://doi.org/10.5194/gmd-10-3889-2017, 2017
– Rodríguez-Fernández et al., https://www.mdpi.com/2072-4292/11/11/1334, 2019

Profil du candidat :
– Ingénieur ayant effectué son stage de fin d’études dans une thématique de recherche
– Titulaire d’un Master recherche en physique ou mathématiques appliquées

Formation et compétences requises :
Les candidats devront avoir des notions des techniques d’assimilation de données, d’apprentissage automatique, et éventuellement de
modélisation et/ou de télédétection des surfaces terrestres. La connaissance du langage Python est requis pour l’analyse des données,
ainsi qu’une expérience en programmation.

Adresse d’emploi :
Méteopole, Toulouse

Plus d’information:
https://recrutement.cnes.fr/fr/annonce/895060-168-contribution-of-artificial-intelligence-to-the-integration-of-satellite-31100-toulouse

Document attaché :

Categories: theses

These CNES: Deep Learning & Space Oceanography

Mar 31 – Apr 1 all-day

Annonce en lien avec l’Action/le Réseau : MACLEAN

Laboratoire/Entreprise : IMT Atlantique, UMR CNRS Lab-STICC
Durée : 36 months
Contact : ronan.fablet@imt-atlantique.fr
Date limite de publication : 2020-03-31

Contexte :
New CNES PhD position for fall 2020 on Deep Learning for Space Oceanography in the framework of AI Chair Oceanix (https://rfablet.github.io/projects/2019-oceanix).

Sujet :
Artificial Intelligence (AI) technologies, models and strategies open new paradigms to address the modeling, simulation, forecasting and reconstruction of complex systems, including ocean-atmosphere dynamics. Due to the irregular space-time sampling of in situ and spaceborne observation data, most envisioned AI-driven strategies rely on learning representations from simulation data and applying these representations to observation data to inform the processes of interest. The applicability of such schemes may then strongly rely on the ability of simulation data to truly match observation data features, which may be questioned for numerous processes. The general objective of this PhD is to investigate the extent to which one may develop fully observation-driven schemes in the context of the space-based observation of flying and future satellite missions such as SWOT mission. From a methodological point of view, we propose to state this challenge for given geophysical processes or variables as a joint end-to-end learning of a latent geophysically-sound representation and of the associated inversion scheme from irregularly-sampled observation dataset. Using CNNs (Convolutional Neural Network), the taregted methodological contributions are regarded as key building blocks to revisit earth observation challenges, including among others operational satellite-derived geophysical products, data-driven schemes for inter-comparison studies between observation and/or simulation data,… This PhD will be implemented in the collaborative framework of Melody project (ANR MN 2020-2022) with strong interactions between Lab-STICC (R. Fablet), LOPS (B. Chapron), IGE/MEOM (J. Le Sommer), OceanNext (C. Ubelman) and OceanDataLab (L. Gaultier). SWOT-related case-studies in Melody, e.g. wave-current separation, SWOT-derived SLA L4 products, will be the core application ground for the considered methodological developments, including the exploitation of SWOT fast-sampling phase data. The PhD candidate will benefit from the gathered multidisciplinary expertise of the supervision team in Ocean Science, Ocean Remote Sensing, Fluid Dynamics and Data Science.

Detailed presentation of the PhD: https://rfablet.github.io/files/phd_proposal_rfablet_CNESMelody_201910_1.pdf

Profil du candidat :
The targeted PhD candidate shall have a MSc and/or engineer degree in Data Science or Artificial Intelligence with a strong interest in environmental sciences, possibly acknowledged by previous activities or experience. A dual degree in ocean science and data science as promoted by Isblue MSc program would be of key interest. Besides a strong theoretical background, computer skills, including first experience in using state-of-the-art deep learning frameworks (e.g., tensorflow, pytorch) and programming environment (e.g., python, git server), will be expected.

Formation et compétences requises :
More information on the application procedure at: https://recrutement.cnes.fr/en/annonce/899896-200-end-to-end-learning-of-geophysically-sound-cnn-representations-29200-brest

Adresse d’emploi :
IMT Atlantique, Brest, France

Document attaché :

Categories: theses

Apr

Wed

Configuration Automatique de réseaux de neurones profonds, à l’aide de méthodes multi-Objectif

Apr 1 – Apr 2 all-day

Annonce en lien avec l’Action/le Réseau : aucun

Laboratoire/Entreprise : CRIStAL équipe ORKAD
Durée : 3 ans
Contact : julie.jacques@univ-catholille.fr
Date limite de publication : 2020-04-01

Contexte :
ORKAD est une équipe de recherche du groupe thématique OPTIMA du laboratoire CRIStAL (Centre de Recherche en Informatique, Signal et Automatique de Lille) (UMR CNRS 9189) de l’Université de Lille.
L’objectif principal de l’équipe ORKAD est d’exploiter simultanément l’optimisation combinatoire et l’extraction de connaissances pour résoudre des problèmes d’optimisation. Bien que les deux domaines scientifiques se soient développés de manière plus ou moins indépendante, la synergie entre l’optimisation combinatoire et l’extraction de connaissances offre une opportunité d’améliorer les performances et l’autonomie des méthodes d’optimisation grâce à la connaissance et, d’autre part, de résoudre efficacement les problèmes d’extraction de connaissances grâce aux méthodes de recherche opérationnelles [Dhaenens-Jourdan2016]. Nos approches sont principalement basées sur l’optimisation combinatoire mono et multi-objective. Le groupe ORKAD a sur ses travaux des coopérations avec les hôpitaux de la région (CHRU et GHICL), des entreprises (Alicante, Meilleureassurance.com, OVH).
Les méthodes d’optimisation combinatoire présentent l’intérêt d’être rapides et d’explorer de grands espaces de recherche (de nombreuses combinaisons de facteurs) et permettent de gérer partiellement la volumétrie des données. Par exemple, l’algorithme MOCA-I [Jacques2013-a] et ses extensions [Vandromme2015], sont une première approche pour la classification des données hétérogènes et mal réparties par méthode d’optimisation. MOCA-I a été intégré dans un moteur de prédiction de profil patient avec succès permettant de recommander des patients à des inclusions dans des recherches cliniques. Sur l’étude NUTRISEP, il a permis de retrouver en 50 jours autant de patients que l’expert en 1000 jours. Parmi les profils proposés, on retrouve des profils non identifiés par l’expert initialement. MOCA-I a également été testé pour la détection de séjours concernant des patients porteurs de bactéries multi-résistantes (BMR). Il permet de détecter 11,5% de BMR supplémentaires, et 21% de SARM supplémentaires (Staphylococcus aureus résistant à la méticilline).

La généralisation de la collecte de données numériques fait que l’apprentissage automatique sur données massive reste à ce jour un sujet de recherche stratégique. D’un côté, les méthodes d’optimisation permettent déjà d’obtenir des résultats intéressants, sous forme de boîte blanche : les résultats sont facilement interprétables, mais au prix de nombreux calculs. D’autre part, les réseaux de neurones profonds commencent à bénéficier de moyens de calculs importants, comme par exemple des cœurs de calcul dédiés chez NVidia (Tensor Core). L’objectif de cette thèse est de déterminer comment les méthodes d’optimisation multi-objectif peuvent faciliter la configuration des hyperparamètres d’un réseau de neurones et améliorer son interprétabilité. Ce qui permettra de réunir les réseaux de neurones profonds, qui sont une approche boîte noire avec des résultats difficiles à interpréter, et les approches d’apprentissage par optimisation de type boîte blanche, sur lesquelles l’équipe ORKAD a déjà travaillé dans le passé. De plus, l’expérience d’ORKAD sur la configuration automatique d’algorithmes multi-objectifs [Blot2017a] pourra être capitalisée pour ce projet

Sujet :
Dans cette thèse, nous verrons ce que l’optimisation multi-objectif peut apporter aux réseaux de neurones profonds, pour combiner les avantages des deux méthodes : rapidité, qualité et interprétabilité des modèles générés.
Dans cette thèse nous nous intéresserons à deux aspects :

Le premier aspect concerne l’hyper-paramétrage automatique de réseaux de neurones profonds. La mise en place de réseaux de neurones profonds nécessite de déterminer empiriquement la valeur de nombreux hyper-paramètres (par exemple : choix de la fonction d’activation, nombre de couches ,…). Nous proposons d’utiliser des méthodes de configuration automatique d’algorithmes comme irace [López-Ibáñez2016] ou paramILS [Hutter2009] pour déterminer la configuration d’hyperparamètres idéale. L’apport du multi-objectif sera de trouver des configurations de réadaptées à des environnements différents. Il s’agira de générer des réseaux de neurones dont les caractéristiques sont adaptées au problème et à son environnement. Les réseaux de neurones générés pourront être par exemple des réseaux de neurones compacts et peu coûteux en temps CPU/ énergie, à exécuter sur des appareil mobiles comme des raspberry pour l’informatique géodistribuée (fog computing) ou des objets de santé connectés (montres de fitness, cardiofréquencemètre,…). Cette approche doit aussi permettre de proposer des réseaux de neurones qui pourront être utilisés sur des serveurs de calculs ou des architectures avec des coeurs de calcul dédiés (TensorCore).

Le second aspect concerne l’interprétabilité de l’intelligence artificielle (eXplainable Artificial Intelligence ou XAI). Le rapport du CCNE de 2018 (Comité consultatif national d’éthique) recommande l’utilisation de méthodes qui peuvent être facilement remises en question par le personnel médical. Les méthodes de type boîte blanche sont donc à privilégier, ce qui n’est pas le cas des réseaux de neurones profonds. Dans un premier temps l’objectif est d’étudier les approches émergentes telles que BreakDown [Staniak2019] qui permettent d’augmenter l’interprétabilité des réseaux de neurones. Dans un deuxième temps, en étudiant grâce à l’approche proposée précédemment comment les hyper-paramètres influent sur la qualité de l’interprétation. Cela permettra d’élaborer une nouvelle version de l’approche proposée, qui maximise à la fois la qualité de l’interprétation et la performance du réseau.

L’aspect interprétabilité sera enrichi par un cas d’utilisation réel avec l’un des partenaires d’ORKAD. Par le passé, de nombreux travaux ont été réalisés avec ORKAD et le GHICL lors de thèses ou de projets de recherche de type ANR. Dans un premier temps le cas d’utilisation permettra d’évaluer le besoin d’interprétation. Dans un second temps, ce cas d’utilisation permettra de vérifier l’apport de la méthode proposée.

Profil du candidat :
BAC + 5 en informatique
Bon niveau en programmation.

Formation et compétences requises :
BAC + 5 en informatique
C++ et/ou Python (TensorFlow, Scikit learn,…)
Une expérience en Machine Learning et/ou en optimisation combinatoire est un plus

Adresse d’emploi :
UMR CRIStAL
Université de Lille – Campus scientifique
Bâtiment ESPRIT
Avenue Henri Poincaré

Document attaché : sujet_mo_nn_xai_2020.pdf

Categories: theses

Généricité et explicabilité dans les systèmes de recommandation

Apr 1 – Apr 2 all-day

Annonce en lien avec l’Action/le Réseau : aucun

Laboratoire/Entreprise : LAMSADE
Durée : 3 ans
Contact : elsa.negre@lamsade.dauphine.fr
Date limite de publication : 2020-04-01

Contexte :
Les systèmes de recommandation sont souvent vus comme des “boites noires”. Rendre ses systèmes transparents et de génériques est un vrai challenge.

Sujet :
L’objectif principal de cette thèse est d’étudier la diversité des systèmes de recommandation, leurs points communs et différences (d’un point de vue algorithmique mais aussi applicatif) dans un contexte de grande masse de données en constante évolution, ainsi que de comprendre de tels systèmes dans leur contexte. Il s’agira ensuite de tendre vers un modèle de système générique de recommandation capable d’expliquer à l’utilisateur les recommandations retournées.

Profil du candidat :
Bac+5 en Informatique.
Motivé(e) par la Recherche.

Formation et compétences requises :
Machine learning,
Systèmes d’Information.

Adresse d’emploi :
Université Paris-Dauphine (Paris, France)

Document attaché : Proposition_sujet_these1920_FR.pdf

Categories: theses

PhD position in Explainable Recommender Systems

Apr 1 – Apr 2 all-day

Annonce en lien avec l’Action/le Réseau : aucun

Laboratoire/Entreprise : ETIS/CY Cergy Université
Durée : 3 ans
Contact : aikaterini.tzompanaki@cyu.fr
Date limite de publication : 2020-04-01

Contexte :
A recommender aids the user explore the set of items in a system and find the most relevant items to him/her. The two basic recommender categories are the context- and score-based ones. The first category exploits the characteristics of users and items, while the latter depends on the item scores given by the users. Traditional implementations of recommenders are based on TF-IDF and nearest neighbors techniques, while more recent recommenders follow machine learning approaches, like matrix factorization and neural networks. A natural issue that comes along with recommendations is whether a user, or even the system designer understands the results of the recommender. This problem has given rise to the so-called explainable recommenders.

Sujet :
Explainable recommendation helps to improve the transparency, persuasiveness, effectiveness, trustworthiness, and satisfaction of recommendation systems. It also facilitates system designers for better system debugging. So far, the research in explainable recommendations is focused on the Why question: “Why is an item recommended?”. Solutions either consider the recommendation system as a black-box, and thus try to reveal relationships among users and items, the importance of different features with respect to the predicted value, or to dwell into the intrinsic characteristics of the recommendation system in order to truly explain the system. What has not yet been studied though, is the Why-Not aspect of a recommendation: “Why is not a specific item a recommendation?”. We argue that explaining why certain items or categories of items are not recommended can be as valuable as explaining why items are recommended. Why-Not questions have recently gained the attention of the research community in multiple settings, e.g., for relational databases. In machine learning, Why-Not questions are shown to improve the intelligibility of predictions but remain vastly unexplored.
In this thesis proposal we aim to explore Why-Not, machine learning based explainable recommenders. In a second phase, we aim to extend the recommenders so that they can leverage the Why-Not explanations for auto-tuning.
More information at : https://perso-etis.ensea.fr/tzompanaki/phd_proposal.html

Profil du candidat :
The candidate should hold a Msc Degree in fields related to Computer Science, Machine Learning, or Applied Mathematics/Statistics. She/He should have solid knowledge of data management, algorithms and programming. Knowledge and previous experience on machine learning, recommender systems, explainability are a plus. She/He should master the english language (oral and written); knowledge of the french language is not obligatory. She/He must have strong analytical skills, be proactive, self-driven and capable to collaborate with a group of international researchers.

Formation et compétences requises :
The candidate should hold a Msc Degree in fields related to Computer Science, Machine Learning, or Applied Mathematics/Statistics.

Adresse d’emploi :
CY Cergy Paris Université
Site Saint Martin, 2 av. Adolphe Chauvin, Pontoise 95000 France

Document attaché :

Categories: theses

Apr

Thu

CLOSED-LOOP FLOW CONTROL BY PLASMA DISCHARGE AND MACHINE LEARNING

Apr 30 – May 1 all-day

Offre en lien avec l’Action/le Réseau : – — –/– — –

Laboratoire/Entreprise : Institut Pprime
Durée : 3 ans
Contact : Laurent.Cordier@univ-poitiers.fr
Date limite de publication : 2020-04-30

Contexte :
Framework.
In recent years, continuous progress has been made on the performance of both civilian and military aircraft and helicopters, particularly in terms of flight envelope, radiated noise, maneuverability, vibration, etc. However, further improvements can be achieved by using closed-loop fluid flow control around the machine. Compared to mechanical blowing or suction actuators more often used in flow control, the advantages of plasma actuators come from their non-intrusive nature, low-energy cost, and particularly short-reaction times. These actuators are generally composed of a system of electrodes installed on one of the walls of the area to be controlled. By applying a sufficient potential difference between these electrodes, a plasma discharge is generated, inducing an ion wind which creates a flow tangential to the wall in order to accelerate the flow, and especially modify the velocity profile in the boundary layer.

Sujet :
Objectives.
The efficiency of plasma actuators depends to a large extent on their positions on the wall, as well as on numerous other control hyper-parameters (number of electrodes, distances between them, potential difference or electrical power, shape of the electrical signal, frequency of the discharge, etc.). The objective of the thesis is to determine these parameters in order to optimize a previously established performance function. For this, a numerical optimization tool coupling simulation of complex electrostatic phenomena and closed-loop control will be developed. The work will be organized in two broadly coupled axes: i) development of efficient control strategies by machine learning, ii) improvement of the understanding and physical modeling of the mechanisms at work.

Work program, methodologies and means.
We propose to numerically derive closed-loop control strategies of different flows. We will treat the numerical simulation aspects of physical mechanisms (electrodynamics and fluid mechanics) and the development of innovative control strategies (Data Driven approaches based on machine learning methods).
We will study two rather emblematic types of flows:
– The flow behind an obstacle (cylinder, wing profile, backward facing step). This type of strongly separated flow is particularly interesting in cases where the objective of the control is to increase aircraft stealth.
– The mixing layer developing at the interaction of two coaxial jets (see Figure). In this application, the objective is to increase the mixing efficiency between the two jets by exciting the mixing layer with plasma discharges.

In the first part of this thesis, we will simulate the plasma discharge by solving the transport equations of the different electrons and densities of ionic species present in the gas. We will implement in Oracle 3D, the code developed in the EFD team, a 3-species model taking into account the reality of the physical mechanisms. In the second part, we will develop closed-loop control strategies based on machine learning methods, mainly neural networks and reinforcement learning.

This subject is supported by a half scholarship awarded by the “Direction Générale de l’Armement”. Additional funding will be requested within the framework of the Labex Interactifs (Pprime). This topic is at the heart of the CNRS Research Group “Flow Control Separations”, whose Director is Laurent Cordier (Pprime).

Profil du candidat :
Master in Fluid Mechanics / Applied Mathematics / Machine Learning. Appetite for interdisciplinary approaches and machine learning. Desire to go beyond the borders.

Formation et compétences requises :
Master in Fluid Mechanics / Applied Mathematics / Machine Learning. Appetite for interdisciplinary approaches and machine learning. Desire to go beyond the borders.

Adresse d’emploi :
Institut Pprime
SP2MI – Téléport 2 – Bâtiment H2
11 boulevard Marie et Pierre Curie
Futuroscope
France

Document attaché : 202003191620_Flow_Control_Plasma_Discharge_Machine_Learning_DGA_EN-Cordier-Traore.pdf

Categories: theses

End-to-end learning of geophysically-sound CNN representations from satellite-derived observation datasets

Apr 30 – May 1 all-day

Annonce en lien avec l’Action/le Réseau : aucun

Laboratoire/Entreprise : IMT Atlantique, Lab-STICC
Durée : 36 months
Contact : ronan.fablet@imt-atlantique.fr
Date limite de publication : 2020-04-30

Contexte :
Understanding, modeling, forecasting and reconstructing fine-scale and large-scale processes and their interactions are among the key scientific challenges in ocean-atmosphere science. State-of-the-art approaches strongly rely on joint research effort in observing systems (e.g., in situ monitoring, satellite observations) and numerical simulations [e.g., 30]. The ability to relate models and observation data, though significant advances in data assimilation, remain open questions for numerous processes (e.g., small-scale parameterization, ocean-atmosphere interactions, biogeochemical ocean dynamics, climate-scale dynamics). Artificial Intelligence (AI) technologies, models and strategies open new paradigms to address these questions from the in-depth exploration of the existing observation and simulation big data [1,3,12,20-26].

Among others, the recent breakthrough in the resolution of fine-scale cloud processes in climate models [26] is a striking illustration. It further illustrates the typical learning-based paradigm for ocean-atmosphere processes, where a model or representation is learnt from simulation data. However, for numerous processes, on the one hand, the ability of model simulations to be fully representative of real dynamics is questionable and, on the other hand, one would expect to benefit from the existing observation datasets to extract computational representations. The sampling patterns of these observation datasets (e.g., irregular space time-sampling, partially-observed system,…) raise issues which remain to be addressed to develop fully observation-driven and learning-based frameworks for earth science, including space oceanography.

This PhD scholarship is open in the framework of AI Chair OceaniX (Physics-informed AI for Observation-driven Ocean AnalytiX) (https://rfablet.github.io/projects/2019-oceanix).

Sujet :
In this context, the general goal of this project is to address the following topical questions :
Can we develop fully-observation-driven learning-based paradigms from satellite-derived observation dataset, including synergies with other observation data (e.g., ARGO floats, buoys,…) ?
Can learning-based paradigms better inform past and future dynamics from HR satellite-derived observations of the sea surface ?
The methodological backbone underlying these topical questions is the definition and identification of learning-based representations of geophysical dynamics. In the framework of ANR project Melody (2020-2023) and SWOT ST DIEGO project, the proposed methodological framework will be demonstrated and implemented in the context of incoming SWOT mission towards informing past and future sea surface dynamics from HR SWOT snapshots. Case-studies will be designed based on OSSEs (Observing System Simulation Experiment) and real SWOT data.

The PhD candidat will be involved within interdisciplinary scientific collaboration at the interface of Machine Learning, Data Science and Ocean Science including strong collaborations with space oceanography teams (Dr. B. Chapron, Ifremer/LOPS; J. Le Sommer, CNRS/IGE, A. Pascual, CSIC/IMEDEA) and industrial partners (OceanNext, OceanDataLab).

Link to the detailed PhD project: https://www.imt-atlantique.fr/sites/default/files/rfablet/phd_proposal_rfablet_CNESMelody_2020.pdf

Application by mail to ronan.fablet AT imt-atlantique.fr

Formation et compétences requises :
Besides a strong theoretical background, computer skills, including first experience in using state-of-the-art deep learning frameworks (e.g., tensorflow, pytorch) and programming environment (e.g., python, git server), will be particularly expected.

Adresse d’emploi :
IMT atlantique, technopôle Brest-Iroise, Brest

Document attaché :

Categories: theses

L’intelligence artificielle au service des profils des apprenants : ciblage, optimisation et adaptation du processus d’apprentissage.