MaDICS

Construction automatique de modèles à partir de corpus exprimés en langue naturelle

Sep 15 – Sep 16 all-day

Annonce en lien avec l’Action/le Réseau : aucun

Laboratoire/Entreprise : Sciences, Normes, Décision
Durée : 36 mois
Contact : pierre.saurel@paris-sorbonne.fr
Date limite de publication : 2018-09-15

Contexte :
L’équipe Sciences, Normes, Décision (SND) de la Faculté des Lettres de Sorbonne Université, FRE CNRS 3593, mène des recherches interdisciplinaires mobilisant les sciences cognitives et les science des données et de la décision.
L’entreprise MTB est spécialisée dans le développement de systèmes d’information de gestion. Elle a développé une expertise, des méthodes et des outils logiciels lui permettant de formaliser les besoins de ses clients, exprimés en langue naturelle, et de produire un modèle de ces systèmes. L’entreprise a développé une plate-forme permettant de recueillir l’expression des besoins et de produire le modèle du système et, automatiquement, un code source qui le retranscrit (potentiellement dans tout langage qui le retranscrit : java, C#, PHP, C, Python, etc.).
L’entreprise souhaite passer un nouveau cap au moyen d’une identification automatique des modèles et des éléments de structure à partir d’un texte rédigé en langue naturelle.

Sujet :
Dans ce contexte, nous recrutons pour septembre prochain, un doctorant CIFRE travaillant sur la construction automatique de modèles à partir de corpus exprimés en langue naturelle.

Profil du candidat :
Informatique, algorithmique, sciences des données
ET/OU
Analyse sémantique, Text-Mining, NLP (Natural Language Processing)

Le candidat (Ingénieur ou Master 2) devra avoir des compétences fortes dans l’un de ces deux domaines, et porter un intérêt certain pour le second.

Formation et compétences requises :
Informatique, algorithmique, sciences des données
ET/OU
Analyse sémantique, Text-Mining, NLP (Natural Language Processing)

Le candidat (Ingénieur ou Master 2) devra avoir des compétences fortes dans l’un de ces deux domaines, et porter un intérêt certain pour le second.

Adresse d’emploi :
SND (Sorbonne Université Paris) & MTB (Mairie de Clichy en région parisienne)

Document attaché :

Categories: theses

Sep

Sun

3-year fully funded PhD position in the research area of Big Data and Artificial Intelligence (AI)

Sep 30 – Oct 1 all-day

Annonce en lien avec l’Action/le Réseau : aucun

Laboratoire/Entreprise : ETIS Lab UMR 8051, Paris, France and the Department of Computer Science, University of Warwick
Durée : 3 years
Contact : Dimitrios.Kotzinos@u-cergy.fr
Date limite de publication : 2018-09-30

Contexte :
Scope and Context
Due to the massive amounts of available data, various critical database tasks, e.g. query answering, become more of an approximate task than an exact one. On the other hand, the functioning of many critical Big Data system components depends on monitoring and predictions: e.g. in caching subsystems (which items to cache/prefetch), in query optimisation (the best access method to use), in indexing (when and for which attributes to build indexes). Additionally, big data analyticsÕ systems need to be able to decide on the fly the most suitable (e.g. matching or optimization) algorithms to use in different cases. Similarly, many different prediction models for analytical queries (e.g., regression models) may perform differently for different predictive analytics tasks, so the system must decide on the best model to use. These problems can be approached by the use of predictive modelling adaptation techniques, well established in Artificial Intelligence (AI) and Machine Learning (ML). So, we propose to focus on working towards extending current Big Data management and analysis systems with ML and AI-based:
* Approximate analytical query processing engines based on ML models – e.g., queries based on descriptive statistics (COUNT, AVG, SUM, etc.) or on dependence statistics (CORR, CoVar, regressions, etc.). Given the massiveness of the current datasets, approximate query answering is one of the solutions we can employ in order to get responses in reasonable time and provide at the same time error feedback and control. At the same time, we want to introduce into the system uncertainty models with guarantees of maximum error and an understanding of the trade-off error vs time/costs during query processing.
* Self-learning capabilities, big data management and analysis systems should be able to learn by monitoring operations and decisions made so far and use them to extract useful information in order to optimize various of the systemÕs operation, like selecting the best possible algorithms, models, etc.
So, during this PhD we want to investigate the above issues and develop solutions that can be integrated to real world big data management systems.

We expect the successful applicant to be one of the driving forces behind the newly established collaboration between the two entities mentioned above. The successful applicant will work jointly with Professor Dimitris Kotzinos (ETIS / Paris Seine University) and Professor Peter Triantafillou (Department of Computer Science, University of Warwick) and their respective groups, will be based at ETIS lab at the University of Cergy Pontoise in the greater Paris area but frequent exchanges and stays at Warwick are envisioned.

Sujet :
3-year fully funded PhD position in the research area of Big Data and Artificial Intelligence (AI) in collaboration of the MIDI team of the ETIS Lab UMR 8051, Paris, France and the Department of Computer Science, University of Warwick under the supervision of Professor Dimitris Kotzinos and Professor Peter Triantifillou respectively.
(PhD funded under the Paris Seine Initiative of Excellence)

Profil du candidat :
Tentative Starting Date: October 2018

Application
If interested, please send your application (including a detailed CV, university transcripts, a copy of the master thesis and/or scientific papers if available, as well as a list of personal references and a motivation letter) in PDF format to Professor Dimitris Kotzinos (Dimitrios.Kotzinos@u-cergy.fr) and Professor Peter Triantafillou (P.Triantafillou@warwick.ac.uk). Further enquires are also welcome.

Applications are welcome until 20/08/2018 or until the position is filled.

Formation et compétences requises :
A Master 2 (preferably with a research flavor) or equivalent is required.

Adresse d’emploi :
Lab. ETIS UMR 8051
University of Paris-Seine, University of Cergy-Pontoise, ENSEA, CNRS
& Dept. Sciences Informatiques, Université de Cergy-Pontoise
2 av. Adolphe Chauvin
Site Saint Martin, bureau A561
Pontoise F-95302 Cergy-Pontoise
France

Document attaché : PhD-descr-INEX.pdf

Categories: theses

Identification de phénotypes à grande échelle par apprentissage semi-supervisé

Sep 30 – Oct 1 all-day

Annonce en lien avec l’Action/le Réseau : aucun

Laboratoire/Entreprise : LIMICS, Sorbonne Université, Inserm
Durée : 3 ans
Contact : xavier.tannier@sorbonne-universite.fr
Date limite de publication : 2018-09-30

Contexte :
Projet ISCD / LIMICS / APHP, allocation doctorale de l’Institut des Sciences du Calcul et des Données. Des vacations d’enseignement pourront être envisagées en complément, en fonction du profil du candidat.

Mots-clés : Identification de phénotype, sélection de cohortes, apprentissage semi-supervisé, réseaux de neurones, traitement automatique des langues

Sujet :
http://xavier.tannier.free.fr/jobs/files/2018_PhD_Phenotypage.pdf

Profil du candidat :
Formation d’informatique solide avec des compétences en apprentissage automatique ainsi que des notions en traitement automatique des langues. Une appétence pour le travail avec des données de santé.

Formation et compétences requises :
Niveau minimal : master, diplôme d’ingénieur ou équivalent

Adresse d’emploi :
LIMICS, UMRS 1142
Esc. D, 2ème étage
15 rue de l’école de médecine
75006 PARIS
FRANCE

Document attaché : These_2018_Identification_Phenotypes.pdf

Categories: theses

Intelligence des données au profit de la gestion optimisée intra et inter urgences hospitalières

Sep 30 – Oct 1 all-day

Annonce en lien avec l’Action/le Réseau : aucun

Laboratoire/Entreprise : EA2694 “Santé Publique; Epidémiologie et Qualité des Soins”, Univ-Lille et CHU Lille
Durée : 36 mois
Contact : hayfa.zgaya-biau@univ-lille.fr
Date limite de publication : 2018-09-30

Contexte :
Ce travail sera réalisé dans le cadre du projet ANR OIILH (Optimisation inter et intra logistique hospitalière) – axe : Technologies pour la santé

Sujet :
Dans la gestion des systèmes de production de soins, la maîtrise des flux hospitaliers et l’anticipation des tensions sont des enjeux majeurs qui dépendent de l’efficacité des techniques utilisées pour traiter les données collectées. L’exploitation de ces données massives permet notamment d’identifier les indicateurs de performance intra et inter logistique hospitalière. De nombreux verrous techniques et scientifiques sont à étudier en prenant en compte les facteurs humain et socio-économique comme par exemple : comment choisir le(s) indicateur(s) de performance le(s) plus efficace(s) selon le contexte ? ou quels sont les facteurs qui contribuent à l’amplification de la tension et le rallongement des délais aux urgences ?

Dans le cadre du projet ANR OIILH, nous nous focalisons sur l’étude des identificateurs de tension aux urgences adultes du CHR de Lille. Ces indicateurs peuvent être connus comme le temps d’attente et la durée de séjour, ou à identifier grâce à la fouille des données massives.

L’objectif de cette thèse est de trouver une classification efficace d’indicateurs (cachés et/ou connus) permettant d’anticiper, de la manière la plus précise possible la tension aux urgences. Pour ce faire, de nombreux modèles et méthodes existent, pouvant être classés en 2 catégories : les modèles et méthodes statistiques et les modèles et méthodes d’apprentissage artificiel. Le but est de généraliser, innover et adapter ces modèles et méthodes pour anticiper la tension qui survient fréquemment aux urgences adultes dont le fonctionnement et le parcours patient sont complexes.

Profil du candidat :
Le candidat doit avoir un Master de recherche en informatique (de préférence) ou en statistique ou en mathématique

Formation et compétences requises :
Le candidat doit avoir un Master de recherche en informatique (de préférence) ou en statistique ou en mathématique avec des compétences en :
– programmation Python et java.
– optimisation et deep-learning

Un bon niveau en anglais écrit et parlé est nécessaire.

Adresse d’emploi :
EA2694 “Santé Publique; Epidémiologie et Qualité des Soins”, Univ-Lille et CHU Lille.

Document attaché : Sujet-de-these-ANR-OIILH2018_EA2694.pdf

Categories: theses

Multi-modal Urban Transport Modelling and Analysis via Complex Networks, Machine Learning and Big Data Processing

Sep 30 – Oct 1 all-day

Annonce en lien avec l’Action/le Réseau : aucun

Laboratoire/Entreprise : LICIT (IFSTTAR-ENTPE)
Durée : 3 years
Contact : angelo.furno@ifsttar.fr
Date limite de publication : 2018-09-30

Contexte :
We are looking for an enthusiastic PhD candidate to carry out research in the context of the ANR-funded JCJC project PROMENADE (Platform for Resilient Multi-modal Mobility via Multi-layer Networks & Real-time Big-Data Processing).
The goal of PROMENADE is to devise a novel systemic, real-time data-driven platform for efficient, resilient and smart management of multi-modal urban transport, by integrating innovative and sustainable solutions based on complex networks modelling, machine learning and big data technologies.
The scope of the thesis is proposing and devising an innovative data-driven and dynamic graph-based modelling and analysis framework of multi-modal, large-scale urban transport networks to study their vulnerabilities and resilience.
The subject is at the interface among machine learning, big data processing, networks science and transportation.

Sujet :
The PhD thesis will explore complex networks, big data and machine learning methodologies and technologies aimed at grasping the complex inter-relationships existing among multiple transport modes (e.g., private cars, bus lines, subways, trams, railway, etc.) and allowing for a more accurate estimation of their inner properties (vulnerabilities, network clusters, inter-dependencies, traffic dynamics, etc.).
Studying networks individually has been recognized as an extremely crude approximation of the reality, hiding crucial structural and dynamic properties of the modelled system [1, 2]. In [3], authors have showed that real-world interconnected networks are not independent: their coupling can have critical consequences and deeply affects the global behavior of a system. Traditional graph-theory approaches are therefore unable to identify, anticipate and mitigate network vulnerabilities, and preventing cascading failures due to minor or hardly predictable major events [2]. Additionally, such inter-relationships evolve very rapidly due to frequent changes in users’ demand and behaviors, network offer and external factors (e.g., weather and social events).
Very recently, multiplex representations have been successfully considered for studying network failures, traffic congestion and efficiency in multi-modal urban transportation systems.
The thesis will have to respond to the need for overcoming traditional static and mono-modal approaches in modelling and analyzing urban transport networks and their interactions, by targeting a novel framework based on multi-layer networks [1, 2] for capturing the complex and dynamic interactions existing among multiple transport modes, urban infrastructures (e.g., land, telecommunications, water system, power grid, etc.), and urban actors (i.e., network providers, users, planners and operators). Such modelling framework shall have the potential for dealing with the complex organization of real-world, multi-modal systems and the exigence of quantifying the interplays among their different actors and components.
The modelling will be tackled according to a data-oriented, large-scale, and real-time perspective.
Road networks and land maps for the city of Lyon will be retrieved with detailed information (e.g., BD Topo) via the academic partnership existing between IFSTTAR and the National Geographic Institute (IGN). Information on public transport networks and bike sharing systems will be available through an ongoing and consolidated collaboration between IFSTTAR and SYTRAL-Keolis Lyon as well as via open data available on the Data Grand Lyon platform. Traffic data will be available via collaborations with multiple industrial partners (Orange Labs, Mediamobile, SYTRAL-Keolis, etc.). Similarly, for the case of Paris, open-data from RATP will be leveraged to reconstruct the public network topology and its supply information. Crowd-sourced data from OpenStreetMap will also be considered.
The PhD thesis will focus on i) enhancing the multi-layer representation with real-time data and mobility patterns, ii) devise novel metrics that can unfold complex properties of the multi-modal urban network related to transport resilience, iii) providing efficient large-scale implementation of the proposed resilience/vulnerability metrics adapted to multi-layer networks, towards continuous monitoring of network vulnerability.
The PhD position offers the opportunity to work in a multi-disciplinary research environment, to access real-world datasets including large scale multi-modal transport networks, multi-source traffic data (GPS traces, loop data, bluetooth data, etc.), mobile network traffic demands collected in an operational, large-scale French cellular network.
It is expected that the successful candidate will contribute to top-tier computer networks, self-adaptive distributed systems, big data and transportation-related conferences and journals (IEEE INFOCOM, IEEE ICDM, ACM SIGKDD, IEEE Big Data, IEEE Transactions on Autonomous and Adaptive Systems, Transportation Research Board, IEEE Intelligent Transportation Systems, Transportation Research, etc.).

Profil du candidat :
We look for strongly motivated candidates with a strong background in computer science, mathematics and probability/statistics. Candidates with proven skills in the fields of big data, network mining and distributed programming will be preferred. Knowledge of complex networks and machine learning theory will be strongly appreciated. Programming skills with Scala, Java, Python or R are desired.
Proven written and verbal communication skills with fluency in written and spoken English.

Formation et compétences requises :
Level of qualifications required: graduate degree or engineer diploma in Computer Science or strictly-related field;
Starting date: 2018-10-01;
Duration of contract: 3 years;
Deadline to apply: applications will be reviewed until the position is filled;
Main Location: Lyon, France (LICIT, IFSTTAR/ENTPE): https://goo.gl/maps/K19HBR4ETZ92
Project Team:
Advisor: Angelo FURNO (CR), LICIT, University of Lyon, ENTPE, IFSTTAR
Co-advisors:
Nour-Eddin EL FAOUZI (DR), University of Lyon, ENTPE, IFSTTAR
Eugenio ZIMEO (PR), University of Sannio, Italy
PhD Committee: Marco FIORE (CNR-Italy), Razvan STANICA (INSA-Lyon), Zbigniew SMOREDA (Orange Sense Labs), Sybil DERRIBLE (University of Illinois, Chicago).

Adresse d’emploi :
Ifsttar – Lyon-Bron
25, avenue François Mitterrand, Case24
Cité des mobilités.
F- 69675 Bron Cedex

Document attaché : Call-for-PhD-IFSTTAR-ENTPE-multi-modal-network-resilience.pdf

Categories: theses

Robust Traffic Engineering for Software-Defined Networks

Sep 30 – Oct 1 all-day

Annonce en lien avec l’Action/le Réseau : aucun

Laboratoire/Entreprise : Mathematical and Algorithmic Sciences Lab France Research Center, Huawei Technologies Co. Ltd.
Durée : 3 ans
Contact : jeremie.leguay@huawei.com
Date limite de publication : 2018-09-30

Contexte :
The Network and Traffic Optimization research team of the Mathematical and Algorithmic Sciences Lab, Huawei France Research Center, located in the Paris area, is looking for highly motivated candidates for a PhD thesis on Network Optimization. The thesis will be jointly supervised with Telecom Sud Paris within the CIFRE framework.

Sujet :
In recent years, the control paradigm of Software Defined Networking (SDN), which was originally targeting enterprise and data-center networks, has gained momentum. Telecom Operators, and Cloud Service Providers (CSP) have been building wide area overlays managed by a centralized SDN controller in order to provide worldwide, long-haul and cost effective services. MPLS leased lines, best effort Internet, and PoP (Point of Presence) overlays are used to interconnect enterprise branch offices to cloud infrastructures. The dynamic nature of the traffic as well as the uncertain properties of the IP transit links make the management of these systems challenging. Ideally, the network should be dynamically reconfigured as the system evolves. However, reconfigurations cannot be too frequent due to route stability, forwarding rules instantiation, individual flows dynamics, traffic monitoring overhead, etc.
Motivated by the need to dealing with uncertainty, this thesis aims at investigating and designing new algorithms for the offline planning and online control of SDN systems. Robust and stochastic optimization represents the natural choice to model an SDN system considering multiple traffic and connectivity/capacity scenarios. While similar in the way they might consider the uncertainty, robust and stochastic optimization generally provide different outcomes. Stochastic optimization produces optimal results for the average case. In contrast, robust optimization considers specific “unlucky cases” to compute a solution.
The PhD thesis will focus on the modeling and algorithm design of the offline network planning and online control problems using approaches based on robust and/or stochastic optimization.

Profil du candidat :
Ideal candidates should have a Master degree in Telecommunications, Computer Science, or Applied Mathematics from a University or a Grande Ecole. They should have a solid background in Operations Research. Knowledge of telecommunications will be appreciated.

Formation et compétences requises :
English: Operational

Adresse d’emploi :
Contacts
– Huawei FRC: Dr. Stefano PARIS (stefano.paris@huawei.com), and Dr. Jeremie LEGUAY (jeremie.leguay@huawei.com)
– Telecom Sud Paris: Prof. Walid Ben-Ameur (walid.benameur@telecom-sudparis.eu)
Application
To apply please send a complete CV, a motivation letter, grades of University/Grande Ecole studies, and references.

Document attaché : PhD-on-Network-Optimization-Operations-Research.pdf

Categories: theses

Thèse CIFRE (RENAULT-LIPN) : Clustering multi-blocs et visualisation analytique de données séquentielles massives issues de simulation du véhicule autonome

Sep 30 – Oct 1 all-day

Annonce en lien avec l’Action/le Réseau : aucun

Laboratoire/Entreprise : Laboratoire d’Informatique de Paris-Nord (LIPN)
Durée : 3 ans
Contact : mustapha.lebbah@univ-paris13.fr
Date limite de publication : 2018-09-30

Contexte :
Dans un contexte en pleine mutation le monde automobile évolue rapidement : la voiture devient électrique, connectée et autonome. En particulier, l’Alliance Renault-Nissan, premier constructeur automobile mondial, relève le défi de concevoir des véhicules autonomes de niveau 4 (sur 5) SAE à l’horizon 2022.

Le déploiement de véhicules autonomes repose sur le développement de systèmes embarqués (lois de commande, capteurs, fusion de données et dynamique du véhicule) robustes et performants. Effectivement ils doivent rester robustes aux perturbations extérieures et aux défaillances. Ils doivent aussi être performants en face de critères de sécurité routière et de confort des passagers. Ces études doivent être effectuées sur un nombre important de scènes de roulage qui représente un échantillon représentatif de toutes les situations et conditions que peut rencontrer un véhicule autonome.

Devant ce nombre important de cas à étudier, le recours à la simulation numérique massive devient nécessaire. L’Alliance Renault-Nissan veut mettre en place un outil de simulation massive qui simule plusieurs centaines de Mkms par plan de simulation. Cette plateforme de simulation massive a pour vocation d’accompagner le développement de systèmes embarqués de leur conception jusqu’à leur validation fonctionnelle.

Sujet :
La plateforme de simulation massive permet de simuler des plans de simulation et de stocker un grand nombre de variables issues de plusieurs centaines de Mkms de roulage numérique. Ces raw data peuvent être de nature différente :

– Métadonnées qui caractérisent les scénarios, les véhicules et leur variabilité (~ 40 variables)

– Variables externes temporelles issues de l’environnement du véhicule autonome (~ 50 variables)

– Variables internes temporelles issues des modèles systèmes (lois de commande, capteurs, fusion de données et dynamique du véhicule) du véhicule autonome (plusieurs 1000 variables)

Cette plateforme intègrera une fonction d’analyse automatique des causes de dysfonctionnement du véhicule autonome. Pour ce faire deux niveaux d’analyse seront distingués. Le premier est une analyse par causes externes du dysfonctionnement, elle consiste en la description du dysfonctionnement en termes d’éléments externes au véhicule autonome tels que les caractéristiques de la route, les conditions météorologiques ou encore le comportement des autres véhicules. Le second est une analyse par causes internes du dysfonctionnement, elle consiste en la description du dysfonctionnement en termes de signaux internes échangés entre les différents modèles systèmes (lois de commande, capteurs, fusion de données et dynamique du véhicule) du véhicule autonome. Enfin la combinaison des analyses par causes externes et internes doit permettre l’identification de root causes de dysfonctionnement du véhicule autonome.

Afin de pouvoir représenter ce grand nombre de données et surtout mieux les comprendre, il faut faire appel à des techniques de fouille de données séquentielles et d’apprentissage statistique massivement distribué.

Les challenges suivants seront donc à relever :

1/ La visualisation analytique de grand volume de données est forcément plus complexe. Il faudra pouvoir envisager des techniques de projection et de visualisation à grande échelle

2/ Clustering multi-bloc de données séquentielles avec sélection de variables

3/ L’identification des root causes de dysfonctionnement du véhicule autonome

4/ Le traitement de volumes de données de simulation avec les nouvelles plateformes big-data

Profil du candidat :
Le candidat(e) doit avoir de bonnes notions en mathématique, statistiques et algorithmiques. Une expérience en traitement de données massives avec des plateformes innovantes est souhaitable.

Formation et compétences requises :
M2/ingénieur

Adresse d’emploi :
RENAULT Technocentre

Document attaché :

Categories: theses

Mon

Nouveaux algorithmes de prédiction et de planification pour le digital learning basés sur des méthodes d’optimisation

Oct 15 – Oct 16 all-day

Annonce en lien avec l’Action/le Réseau : aucun

Laboratoire/Entreprise : CRIStAL / Mandarine Academy
Durée : 36
Contact : laetitia.jourdan@univ-lille1.fr
Date limite de publication : 2018-10-15

Contexte :
Nature du financement : thèse CIFRE Mandarine BS http://mandarine.academy/
Durée : 36 mois
Contact : pamela.wattebled@mandarine.academy , Julie.jacques@univ-catholille.fr, marie-eleonore.kessaci@univ-lille.fr, laetitia.jourdan@univ-lille.fr
Laboratoire : CRIStAL UMR CNRS

Sujet :
Dans cette thèse nous nous intéresserons à l’élaboration et la mise en place d’un modèle prédictif dans le digital learning. Ce modèle sera prédictif sur la logistique et permettra notamment de trouver quel sujet est à mettre en avant avec l’organisation et l’optimisation des ressources technique et humaine.
Il devra également être prédictif sur le besoin en formations, et permettre l’analyse de l’attente des besoins des clients par secteurs, par métiers, par profil pour investir et mettre en avant les attentes. Le modèle pourra notamment exploiter les données sur les formations standards les plus recherchées sur internet.
Il sera également prédictif sur les contenus, pouvoir avoir un contenu dynamique s’autocréant suivant l’individu, créations automatiques de communications ciblant l’attente de l’utilisateur. Par exemple l’utilisation des contenus micro-learning intégrés dans la communication ou fusionnés dans une même vidéo sur l’évolution de l’utilisateur sur le même principe que « les livres dont vous êtes le héros ».
Dans un premier temps, l’objectif est de fournir un modèle prédictif selon le profil et le parcours des apprenants. Ce modèle doit permettre de proposer des nouveaux cours adaptés à l’apprenant. Ce problème sera modélisé sous forme de problème d’optimisation afin de pouvoir dans un second temps ajouter de nouvelles contraintes sur les cours, la disponibilité de ressource …
Dans une seconde partie, la gestion logistique et la planification des cours sera étudiée et des modèles proposés afin de prendre en compte les multiples contraintes sur les ressources de la société Mandarine BS (ressources matérielles et humaines).

Profil du candidat :
Profil recherché :
Diplomé BAC + 5 en Informatique ou mathématique avec cours d’informatique
)

Formation et compétences requises :
Compétence technique : C/C++, Php, mysql/sql , Symphony2 serait un plus
Connaissances : machine learning, optimisation, algorithmique
Langue : Français, Anglais (B2 minimum

Adresse d’emploi :
Par mail envoyer les pièces : CV, lettre de motivation, notes M1+M2, classement aux 3 contacts du sujet avec pour objet Candidature thèse Mandarine BS

Document attaché :

Categories: theses

Tue

Big Data Series Analytics in the Context of Environmental Crowd Sensing

Oct 23 – Oct 24 all-day

Annonce en lien avec l’Action/le Réseau : Doctorants

Laboratoire/Entreprise : DAVID Lab – University of Versailles
Durée : 3 ans
Contact : Karine.Zeitouni@uvsq.fr
Date limite de publication : 2018-10-23

Contexte :
Upon the recent development of advanced computing and communication technologies, the world is witnessing the rise of the so-called Internet of Things (IoT). IoT envisions a world where everything is connected – from humans and computing devices to animals, vehicles, and even the smallest appliances. Sensors and actuators are fetched on things enabling them to sense, generate data, communicate, act, and share information. This is leading to the generation of massive amount of data, now regarded as Big Data or Big Sensing Data in the IoT context. With great embedded potential in this data, both industry and academia are rushing to develop methods and technologies that not only can handle this large amount of data but can also exploit them in order to mine new knowledge and insights.
One application of IoT is monitoring of air pollution. Several research initiatives have used fixed air pollution sensors to monitor air quality [1]. However fixed sensors have been facing shortcomings in modeling air quality because of the high spatiotemporal variability nature of air pollutants. That is why the community is shifting toward a new monitoring paradigm, namely mobile crowd sensing, that empowers volunteers to contribute data acquired by their personal sensor-enhanced mobile devices [2]. This is enabled by the use of emerging low-cost and lightweight air pollution sensor boxes, which can be fetched on pedestrians, cyclists, or on vehicles. Opportunistic air quality monitoring takes advantage of existing mobile infrastructure or people common daily routines to perform monitoring [3].
This paradigm has several advantages compared to conventional monitoring techniques. First, it promotes personalization where each individual will be able to gain insights on his/her exposure. Second, it measures indoor and outdoor environments (Home, Work, Transportation, Streets, Parks, etc.) and expands the spatial coverage, depending on the participants whereabouts. Finally, it enables insights at a higher resolution along the participants trajectories, thereby allowing to capture local variability and peaks of pollution. Nevertheless, the main limitation of opportunistic sensing arises from its uncontrolled sampling nature, leading to highly uneven data density across regions and times of the day. Mining such inhomogeneous samples inherently raises unique challenges that we intend to tackle in this thesis. From the perspective of the study of daily exposure, typical exposure profiles could be mined from the longitudinal data set. However, there is a gap to fill between the raw sensor data series and high-level profiles.

Sujet :
While Mobile Crowd Sensing paradigm has opened the door for new possibilities, it has also generated some challenges [2]. Indeed, the nomadic nature of sensors, and their combination (air pollution is often monitored using multi-sensor devices) lead to revisit the traditional methods of data mining and knowledge extraction. These sensors typically produce multivariate time series where one variable is the geographical position of the device (we call it complex data series). Nevertheless, exploiting such complex data series for analytical purpose, such as exploratory analysis using data mining techniques, is far from straightforward. Since raw sensor data are mostly noisy and acquired at irregular (and asynchronous) frequencies, direct use of the state-of-the-art methods, such as time series analysis and mining, is insufficient. Besides, to take full advantage of these data, it should not be only analyzed in isolation, but rather by matching them with the context, and analyzing them under multiple dimensionality and scale (e.g., spatial, user, micro-environment, time dimensions). Here comes one of the challenges on how to transit from raw and heterogeneous complex data series into such a type of high-level models.
Moreover, going further in exploiting the personalization aspect of opportunistic mobile sensing enables individuals to relate air pollution to themselves [4], and to act upon gained insights. For example, an individual may change his/her daily routes, transportation means, even his/her activities in sake of lesser exposure and lesser health effects. Nonetheless, this requires building individual profiles, and correlating them with personal health data and activities. This correlation opens the way for highlighting potential relations of causality, or inferring the exposure based on an activity profile or a planned route.
In this thesis, we aim at developing data mining methods adapted to opportunistic samples of geodated series along with associated contextual data on the one hand, and studying multi-dimensional exploratory analysis and aggregation of such data on the other hand.

Profil du candidat :

– Good background in data mining and machine learning
– Strong programming, system, and database skills
– Good oral communication and technical reading and writing skills in English
– Proficiency in French is desirable.

Formation et compétences requises :
The applicant should hold a Master diploma in Computer science, or equivalent.

Adresse d’emploi :
Hosting laboratory:
DAVID Lab/ADAM Team, University of Versailles St-Quentin / Paris-Saclay University: www.david.uvsq.fr
Located in the city of Versailles
Doctoral school: https://www.universite-paris-saclay.fr/en/doctorate

Document attaché : PhD_Proposal_Polluscope_Versailles_France.pdf

Categories: theses

Thu

Offre de thèse : Machine Learning for remote sensing multi-modal imagery with application to land cover mapping

Oct 25 – Oct 26 all-day

Annonce en lien avec l’Action/le Réseau : aucun

Laboratoire/Entreprise : CESBIO
Durée : 3 ans
Contact : dino.ienco@irstea.fr
Date limite de publication : 2018-10-25

Contexte :
Dear colleagues,

CESBIO offers a PhD position to work on machine learning algorithms for
the fusion of multi-modal imagery with application to land cover
mapping. The details are given here:

https://mycore.core-cloud.net/index.php/s/CIM3LdxcarS2fMb

Candidates should send an e-mail to jordi.inglada@cesbio.eu containing:
1. Full CV
2. Letter of interest
3. Contact information for 2 references

Sujet :
Machine learning algorithms for
the fusion of multi-modal imagery with application to land cover
mapping.

Profil du candidat :
Machine Learning
Data Science
Remote Sensing
Signal and Image Processing

Formation et compétences requises :
Machine Learning
Data Science
Remote Sensing
Signal and Image Processing

Adresse d’emploi :
Centre d’Etudes Spatiales de la BIOsphère (CESBIO)
Toulouse

Document attaché :

Categories: theses

Tue

PhD Position on Network visibility with Machine Learning

Oct 30 – Oct 31 all-day

Annonce en lien avec l’Action/le Réseau : aucun

Laboratoire/Entreprise : Huawei Technologies
Durée : 3 ans
Contact : jeremie.leguay@huawei.com
Date limite de publication : 2018-10-30

Contexte :
The Network and Traffic Optimization research team of the Mathematical and Algorithmic Sciences Lab, Huawei France Research Center, located in the Paris area, is looking for highly motivated candidates for a PhD thesis on Network Traffic Analysis. The thesis will be jointly supervised with INRIA within the CIFRE framework.
PhD thesis

Sujet :

According to NSS Labs, 55% of internet traffic is already encrypted. That number is expected to increase to around 75% by 2019. As a consequence, network service providers are “going dark” as every bit of encrypted data crossing their network looks the same. They cannot anymore protect, prioritize, and optimize traffic efficiently. Gaining visibility into encrypted traffic has become critical for network operators.

The analysis of encrypted traffic is facing two main challenges. Firstly, labeled examples are scarce and difficult to obtain. It requires either to analyze flows using heavy inspection methods or to have a priori knowledge on traffic. And supervised learning methods do not generalize well when being trained with few samples in the dataset. Secondly, new applications may appear over time or old applications may change their behavior. In this context, traditional supervised methods, which map unseen flow instances into one of the know classes, do not have the ability to detect new types of flows. Indeed, Models that are built through training on older version of applications often make poor and ambiguous decisions when faced with more recent or new applications – a phenomenon commonly known as concept drift.

To overcome these challenges, The PhD thesis will focus on semi-supervised learning techniques that make use of the available labeled data regarding known behaviors from the past, to detect drifts (or changes) in unlabeled data made available in the future. And more particularly on two key problems: the online change detection and the adaptation of classifiers under concept drifting.

Profil du candidat :
Specific Requirements
Ideal candidates should have a Master degree in Telecommunications, Computer Science, or Applied Mathematics from a University or a Grande Ecole. They should have a solid background in Machine Learning. Knowledge of telecommunications will be appreciated.
English: Operational

Formation et compétences requises :
Ideal candidates should have a Master degree in Telecommunications, Computer Science, or Applied Mathematics from a University or a Grande Ecole.

Adresse d’emploi :
Contacts
– Huawei FRC: and Dr. Jeremie LEGUAY (jeremie.leguay@huawei.com)
– INRIA: Dr. Renata Texeira (renata.teixeira@inria.fr) and Prof. Vassilis Christophides (vassilis.christophides@inria.fr)
Application (dead line: October 15th)
To apply please send a complete CV, a motivation letter, grades of University/Grande Ecole studies, and references.
Huawei
The Huawei France Research Center (FRC) located in Boulogne-Billancourt, Paris area, is responsible for advanced research in the fields of Algorithm and Software design, Aesthetics, MBB & Home devices and Parallel Computing, to create and design the innovative technologies and software platforms for our Brand.

Document attaché : PhD-on-Traffic-Analysis-Machine-Learning.pdf

Categories: theses

Nov

Fri

Segmentation biophysique par réseaux de neurones appliquée à l’imagerie satellitaire multi-spectrale

Nov 30 – Dec 1 all-day

Annonce en lien avec l’Action/le Réseau : aucun

Laboratoire/Entreprise : CEDRIC – Cnam (EA4629)
Durée : 3 ans
Contact : michel.crucianu@cnam.fr
Date limite de publication : 2018-11-30

Contexte :
Voir document joint.

Sujet :
Voir document joint.

Profil du candidat :
La candidate ou le candidat doit avoir de bonnes connaissances en apprentissage statistique en général et en apprentissage profond en particulier. Des connaissances dans le domaine du traitement du signal et des images sont également nécessaires.

Les candidatures doivent inclure une présentation, un CV détaillé et les relevés des notes obtenues en Master. Les candidatures seront transmises à michel.crucianu@cnam.fr, marin.ferecatu@cnam.fr, mihai.datcu@dlr.de et Sebastien.DORGAN@c-s.fr

Formation et compétences requises :
Master 2 dans le domaine de l’apprentissage statistique, de l’intelligence artificielle ou du traitement du signal et des images. La maîtrise d’au moins un framework d’apprentissage profond.

Adresse d’emploi :
Cnam, 2 rue Conté, 75003 Paris

Document attaché : theseCS18.pdf

Categories: theses

Dec

Sat

Apprentissage profond et par transfert pour l’élaboration de Systèmes prédictifs pour la maintenance et l’amélioration continue en présence de données complexes hétérogènes et en temps réel

Dec 1 – Dec 2 all-day

Annonce en lien avec l’Action/le Réseau : aucun

Laboratoire/Entreprise : LIASD, IUT de Montreuil / MOM Packaging
Durée : 3 ans
Contact : m.lamolle@iut.univ-paris8.fr
Date limite de publication : 2018-12-01

Contexte :
Thèse CIFRE
Dans l’objectif d’améliorer davantage ses services auprès de ses clients, MOM Packaging ((www.MOM-packaging.com) souhaite développer un service décentralisé de prédiction des pannes sur les machines qu’elle fabrique. En effet, depuis la mise en place de la solution « SmartConnect » qui permet à MOM Packaging de mettre en place une solution « Big Data » sur ses machines, et de se connecter à distance, la récolte massive des données clients en temps réel depuis les machines distantes est en cours de développement. C’est une solution, basée sur Spark et Kafka, qui est couplée au SmartConnect afin d’enregistrer les flux de données envoyés en temps réel et en continue par tous les automates des différentes machines et ce pour tous les clients connectés de MOM Packaging.
Chaque client possède sa propre base de données contenant l’historique des différentes opérations effectuées par les automates à chaque instant. Ces données seront analysées en temps réel afin de prédire des éventuelles pannes sur les machines. La prédiction des pannes est un enjeu majeur pour la société MOM Packaging est considérée comme un futur service à offrir à ses clients, dans le cadre de la démarche de qualité de l’entreprise. En fonction du type de panne, un service de recommandation dédiée sera proposé aux clients cibles. La prédiction des pannes représente pour le client, un gain de temps et d’argent puisque le plan d’intervention sera proposé en minimisant le risque d’arrêt total de ou des machines. D’autre part, la prédiction ciblée permettra sans aucun doute de garder les machines le plus longtemps possible. Ces deux derniers objectifs s’inscrivent clairement dans la démarche durable de l’entreprise MOM Packaging Marketing.

Sujet :
Les objectifs de la thèse portent donc sur les problématiques suivantes :
concevoir un système de prédiction de pannes en analysant différentes sources de données tels que les profils des machines, les profils des clients, les profils des intervenants et des opérateurs ainsi que l’historique des pannes. Ce même système devra être capable de recommander des protocoles ciblés d’assistance aux pannes. Le système de prédiction devra opérer d’une manière robuste et efficace malgré l’hétérogénéité des données et/ou des représentations. En particulier, des images en temps réel provenant des armoires électriques chez les clients devraient également faire partie du modèle de prédiction ;
enrichir le système de prédiction d’un simulateur de pannes faisant intervenir des techniques de réalité virtuelle et réalité augmentée. La combinaison des deux démarches de prédiction réelle et virtuelle est bénéfique aux machines particulièrement fragiles « mécaniquement » ;
savoir adapter la production et le renouvellement intelligent et efficace des machines en production relativement à la prédiction des pannes ;
proposer un processus de gestion automatisée des prédictions en intégrant la technologie « Blockchain ». En effet, le Blockchain est un enregistrement sécurisé des transactions rassemblées en blocs. Ces derniers seront regroupés par ordre chronologique et répartis sur différents serveurs afin de fournir une provenance fiable. Grâce aux signatures numériques, le Blockchain offrira un mécanisme de consensus permettant aux partenaires (client, MOM Packaging) de s’entendre sur les transactions valides. La contribution de cette technologie à l’amélioration de la précision de la tarification, la réduction des coûts administratifs et l’amélioration du service de maintenance et la minimisation des réclamations des clients, sera étudiée ;
assurer un service fiable et continu d’intervention en cas de panne en proposant une solution à base de ChatBot visuel. Cette solution devra exploiter les composants virtuels 3D des machines, le catalogue des pannes et les processus pré-enregistrés de réparation associés afin d’aider à tout MOM Packaging et le client de résoudre la panne en toute autonomie.

Profil du candidat :
Master recherche en informatique, apprentissage, traitement du signal.

Formation et compétences requises :
• Master recherche en informatique, apprentissage, traitement du signal.
• Anglais avec une bonnes capacités orales et rédactionnelle.
• Expérience avec les langages de programmation C++, CUDA, Python et Java.
• Expérience avec les plateformes Big Data Spark.
• Une expérience avec un framework de machine learning (Tensorflow, Mllib).

Adresse d’emploi :
MOM Packaging, 19 Allée Louis Breguet, 93420 Villepinte
IUT de Montreuil, 140 rue de la Nouvelle France, 93100 Montreuil

Document attaché : TheseVersionAdiffuser.pdf

Categories: theses

Dec

Sun

PhD Position on Network visibility with Machine Learning

Dec 30 – Dec 31 all-day

Annonce en lien avec l’Action/le Réseau : aucun

Laboratoire/Entreprise : Huawei Technologies
Durée : 3 ans
Contact : jeremie.leguay@huawei.com
Date limite de publication : 2018-12-30

Contexte :
The Network and Traffic Optimization research team of the Mathematical and Algorithmic Sciences Lab, Huawei France Research Center, located in the Paris area, is looking for highly motivated candidates for a PhD thesis on Network Traffic Analysis. The thesis will be jointly supervised with INRIA within the CIFRE framework.

Sujet :
According to NSS Labs, 55% of internet traffic is already encrypted. That number is expected to increase to around 75% by 2019. As a consequence, network service providers are “going dark” as every bit of encrypted data crossing their network looks the same. They cannot anymore protect, prioritize, and optimize traffic efficiently. Gaining visibility into encrypted traffic has become critical for network operators.

Profil du candidat :
Ideal candidates should have a Master degree in Telecommunications, Computer Science, or Applied Mathematics from a University or a Grande Ecole.

Formation et compétences requises :
They should have a solid background in Machine Learning. Knowledge of telecommunications will be appreciated.

Adresse d’emploi :
18 quai du point du jour, 92100 Boulogne-Billancourt

Document attaché : PhD-on-Traffic-Analysis-Machine-Learning.pdf

Categories: theses

Dec

Mon

Graph-based learning from integrated multi-omics and multi-species data (data science/bioinformatics) IFP Energies nouvelles/CentraleSupélec

Dec 31 2018 – Jan 1 2019 all-day

Annonce en lien avec l’Action/le Réseau : aucun

Laboratoire/Entreprise : IFP Energies nouvelles/CentraleSupélec
Durée : 3 ans
Contact : laurent.duval@ifpen.fr
Date limite de publication : 2018-12-31

Contexte :
Micro-organisms are studied here for their application to bio-based chemistry from renewable sources. Such organisms are driven by their genome expression, with very diverse mechanisms acting at various biological scales, sensitive to external conditions (nutrients, environment). The irruption of novel high-throughput experimental technologies provides complementary omics data and, therefore, a better capability for understanding for the studied biological systems. Innovative analysis methods are required for such highly integrated data. Their handling increasingly require advanced bioinformatics, data science and optimization tools to provide insights into the multi-level regulation mechanisms (Editorial: Multi-omic data integration).

Sujet :
The main objective of this subject is to offer an improved understanding of the different regulation levels in the cell (from model organisms to Trichoderma reesei strains). The underlying prediction task requires the normalization and the integration of heterogeneous biological data (genomic, transcriptomic and epigenetic) from different microorganisms. The path chosen is that of graph modelling and network optimization techniques, allowing the combination of different natures of data, with the incorporation of biological a priori (in the line of BRANE Cut and BRANE Clust algorithms). Learning models relating genomic and transcriptomic data to epigenomic traits could be associated to network inference, source separation and clustering techniques to achieve this aim. The methodology would inherit from a wealth of techniques developed over graphs for scattered data, social networks. Attention will also be paid to novel evaluation metrics, as their standardization remains a crucial stake in bioinformatics. A preliminary internship position (summer/fall 2018) is suggested before engaging the PhD program. Information at:
http://www.laurent-duval.eu/lcd-2018-intern-phd-epigenetics-omics-graph-processing.html

*Publications:
**A. Pirayre, C. Couprie, L. Duval, F. Bidard, J.-C. Pesquet, BRANE Cut: biologically-related a priori network enhancement with graph cuts for gene regulatory network inference, 2015, BMC Bioinformatics
**A. Pirayre, C. Couprie, L. Duval, J.-C. Pesquet, BRANE Clust: Cluster-Assisted Gene Regulatory Network Inference Refinement, 2018, IEEE/ACM Transactions on Computational Biology and Bioinformatics
**D. Seux, F. D. Malliaros, A. Papadopoulos, M. Vazirgiannis, 2017, Core Decomposition of Uncertain Graphs Using Representative Instances, International Conference on Complex Networks and Their Applications

Profil du candidat :
Engineering school, Master of Science in data science/bioinformatics or related disciplines

Formation et compétences requises :
Bioinformatics, Data Science, Optimatization, Statistics, Applied Mathematics, Graph data processing, Gene network inference, Transcriptomics

Adresse d’emploi :
1 avenue de Bois-Préau, F-92852 Rueil-Malmaison, France

Document attaché : IFPEN-Centrale-Supelec-PhD-graph-learning-omics-bioinformatics-data-science.pdf

Categories: theses

CityEngine for Urbanity Vizualisation based on hypernetworks

Dec 31 2018 – Jan 1 2019 all-day

Annonce en lien avec l’Action/le Réseau : aucun

Laboratoire/Entreprise : University of Technology Troyes, Institut Charles Delaunay
Durée : 36 mois
Contact : eddie.soulier@utt.fr
Date limite de publication : 2018-12-31

Contexte :
Two distinct entities cannot occupy exactly the same point of a geographical scope. The distance is a key variable of any social space. Space has some attributes (scale, metric, topic…). Space is thus not absolute, it is relative and must be calculated. There are two methods to minimize the distance between two social realities which is the cost function to be optimized: colocation and mobility. Finally, space only fully exist if it is used by actor or involved in some activity (spatiality). Space and spatiality is a multi-dimensional complex network termed as assemblage.
Space and spatiality are also defined by a space value. The city illustrates the relevance of this model. In space term, the city is a spatial object which privileges the colocation: it gives access to a maximum of social realities in a minimum of time and cost. To search colocation aims to increase economic efficiency, development of social interactions or improvement of city management. In terms of spatiality, the city can be defined by its urbanity (Levy and Lussault, 2003), i.e. by the conjunction of two factors: density and diversity of the co-located objects. The search of colocation produces growth of density and increase of the diversity of the co-located objects. Conversely, the simultaneous increase in mobility (displacement, telecommunication) privileges connectivity compared to immediate contact and leads to urban sprawl, and thus to the weakening of densities and, often, diversity.
The city engine concept proposed in this research has the objective to calculate, for some specified urban situations (use cases), the space and spatiality hypergraph which minimizes the distance and optimizes urbanity, while taking into account mobility. Use cases could be: crime map, mobility management, improving cycling safety, smart elderly care system, smart commuting, personal emergency response, interactive street sensing, stimulating green behavior, etc.

Sujet :
2) Objective and challenges
Hypernetworks generalize the concept of a relation between two things to relations between many things. Relational simplices have multi-dimensional connectivity related to hypergraphs, simplicial complexes and the Galois lattice of maximally connected sets of elements. This structure acts as a kind of backcloth for the dynamic system traffic represented by numerical mappings, where the topology of the backcloth constraints the dynamics of the traffic. Simplices provide a way of defining multilevel structure. Multilevel hypernetworks provide a significant generalization of network theory and set theory, enabling the integration of relational structure that are likely to be necessary for a science of
complex multilevel socio-technical systems. Theory of hypernetworks is based on previous work of Ron Atkin, following the ideas of Clifford Hugh Dowker, generalized by J. Johnson (2013).
Two main challenges are to be considered:
1. Tools for manipulating simplicial complexes and hypernetworks.
2. Implement relational algebras to analyze massive heterogeneous data sets.

Profil du candidat :
Academic Requirements:
Persons with a Master’s Degree or equivalent degree of higher education (Curriculum Vitae)
Algorithmic, Data Science, Machine Learning
Programming (Python, C/C++, Java)
Mathematical skills
Knowledge in probabilities and statistics

Formation et compétences requises :
Successful candidates should have a master degree in mathematical/statistical
sciences, Machine learning, statistical signal processing, or a closely related area,

Adresse d’emploi :
CNRS « Institut Charles Delaunay ». UTT – Université de Technologie de Troyes 12 rue Marie Curie – CS 42060 – 10004 TROYES CEDEX (and/or PARIS)

Document attaché : PhD-Proposal-ENGIE-UTT-V1.pdf

Categories: theses

PhD position in Data Science & Artificial Intelligence: Latent Data Models for Large-Scale Clustering

Dec 31 2018 – Jan 1 2019 all-day

Annonce en lien avec l’Action/le Réseau : aucun

Laboratoire/Entreprise : Lab of Mathematics Nicolas Oresme
Durée : 36 months
Contact : faicel.chamroukhi@unicaen.fr
Date limite de publication : 2018-12-31

Contexte :
This research will be performed in the framework of the ANR project SMILES: Statistical Modeling and Inference for unsupervised Learning at largE-Scale

Sujet :
Please see the attached pdf file for the description of the subject

Profil du candidat :
Required profile: Successful candidates should have a master degree in mathematical/statistical
sciences, Machine learning, statistical signal processing, or a closely related area, and strong skills
in statistical inference and in programming with Matlab and/or R and/or Python. The PhD thesis
as well all the research reports/papers will be written in English. So strong skills in English writing/
speaking are needed. Expected skills include unsupervised learning, model-based clustering,
and distributed large-scale algorithms computing. International applications are welcome to join
our international team (there is no any required French skills). For candidates who wish to learn
French, free courses are offered by the university/the doctoral school to foreign students.

Adresse d’emploi :
Université de Caen, Laboratoire de Mathématiques Nicolas Oresme, Normandie

Document attaché : PhD-ANR-SMILES.pdf

Categories: theses

Thèse : Morphologie mathématique sur cartes d’élévation

Dec 31 2018 – Jan 1 2019 all-day

Annonce en lien avec l’Action/le Réseau : aucun

Laboratoire/Entreprise : IFP Energies nouvelles, CMM Mines ParisTech
Durée : 3 ans
Contact : maxime.moreaud@ifpen.fr
Date limite de publication : 2018/12/31

Contexte :
Les cartes d’élévation sont des images dont chaque point porte une information d’élévation. Ces images sont obtenues par des procédés d’imagerie faisant intervenir une notion de reconstruction de surface 3D, comme par exemple la reconstruction stéréo à partir d’au moins deux images observant la même scène avec des angles de vue différents. Ces cartes d’élévation peuvent être obtenues par microscopie électronique à balayage (MEB) pour des objets à l’échelle du micromètre, en particulier en utilisant de précédents travaux de [Drouyer et al., 2017]. L’application visée est la caractérisation avancée des phases actives cristallines et supports de catalyseurs. L’activité de ces phases actives est en effet spécifiquement liée aux orientations particulières de certaines faces cristallines et également aux aires de ces faces.

Sujet :
Nous nous intéresserons à développer des opérateurs permettant d’extraire des caractéristiques géométriques comme par exemple des notions de granulométrie (histogramme de taille d’objets), de mesure d’aire de surfaces, ou de classification de surfaces en fonction de l’orientation de leur normale.
Pour réaliser ce travail, la piste envisagée est différente d’une approche classique consistant à reconstruire une surface 3D triangulée à partir de la carte d’élévation. Cette démarche présente en effet certaines limitations : dépendance à la méthode de triangulation utilisée, mauvaise gestion de fortes discontinuités, et enfin calculs 3D souvent complexes. Nous proposons de travailler directement sur les cartes d’élévation en utilisant des opérations de traitement d’images 2D, issues en particulier du domaine de la morphologie mathématique. Les intérêts de cette approche originale sont d’une part que les opérations algorithmiques sont relativement rapides, et d’autre part qu’elles utilisent directement les données initiales sans les transformer.

[Drouyer et al., 2017] Drouyer S., Beucher S., Bilodeau M., Moreaud M., Sorbier L. (2017) Sparse Stereo Disparity Map Densification Using Hierarchical Image Segmentation. In: Angulo J., Velasco-Forero S., Meyer F. (eds) Mathematical Morphology and Its Applications to Signal and Image Processing. ISMM 2017. Lecture Notes in Computer Science, vol 10225. Springer, Cham

Profil du candidat :
Master 2, ingénieur

Formation et compétences requises :
Traitement d’images, mathématiques appliquées, programmation C/C++
Bonne maîtrise du français indispensable, anglais souhaitable

Adresse d’emploi :
IFP Energies nouvelles

Document attaché :

Categories: theses

Feb

Fri

2019

Modèle de qualification interactif de données de commerce maritime imparfaites sur le XVIIIème siècle

Feb 15 – Feb 16 all-day

Annonce en lien avec l’Action/le Réseau : RoD

Laboratoire/Entreprise : UMR 7266 LIENSs, Université de la Rochelle; CNRS
Durée : 3 ans
Contact : christine.plumejeaud-perreau@univ-lr.fr
Date limite de publication : 2019-02-15

Contexte :
Le doctorant sera hébergé dans les locaux dans l’Unité Mixte de Recherche Littoral Environnement et Sociétés, (U.M.R. 7266 LIENSs). Ce laboratoire regroupe des experts scientifiques des disciplines de
l’écologie, la géographie, la biologie, l’histoire, la chimie moléculaire et les sciences de la terre et interroge des questions liées au développement durable et au changement climatique autour des zones littorales (https://lienss.univ-larochelle.fr/). Le doctorant intègrera donc un milieu fortement interdisciplinaire et en particulier le service de la plateforme base de données au croisement de nombreux projets scientifiques afin d’offrir une meilleure capacité de croisement de données fortement hétérogènes, et de favoriser la mise en œuvre des principes FAIR dans la recherche.
L’équipe de PORTIC sur la Rochelle est coordonnée par Christine Plumejeaud-Perreau, qui travaille depuis 5 ans avec Alain Bouju, Maitre de conférences avec Habilitation à Diriger des Recherches en
informatique au Laboratoire d’Informatique, Images et Interactions (L3i https://l3i.univ-larochelle.fr/) de l’Université de la Rochelle.
Directeurs de thèse : Alain Bouju (L3I) and Christine Plumejaud-Perreau (LIENSs).

Sujet :
Ce sujet s’inscrit dans le cadre d’un programme financé par l’Agence Nationale de la Recherche, dénommé PORTIC, qui entend étudier les dynamiques spatiales et économiques à l’œuvre dans le processus de construction de marchés de plus en plus intégrés qui prépare et accompagne la Révolution industrielle. A cette fin, il croisera les données sur la navigation des ports français et celles issues de la balance du commerce afin de mieux saisir l’articulation entre espaces régionaux, nationaux et internationaux du commerce français du XVIIIe siècle, en s’appuyant sur deux corpus existants – Navigocorpus et Toflit18 – produits au cours de deux programmes ANR achevés. Le croisement des
deux corpus permettra, entre autres, d’estimer plus précisément la part respective du commerce national et étranger, d’affiner les connaissances sur les ports qui articulent les marchés et leurs interactions, d’analyser les phénomènes régionaux de spécialisation entre plusieurs ports, de mesurer l’impact des conflits sur l’économie d’un port, de prendre la mesure de la contrebande à travers la Manche, de peser la part prise par les Français dans les services de transport international qui échappe aux statistiques commerciales de l’époque, ou encore de calculer la ratio entre la valeur du commerce et le tonnage ou les effectifs de main-d’œuvre affectés au transport maritime en fonction des flux.
PORTIC est un projet co-construit par des historiens, des économistes, des géomaticiens, des informaticiens, et des spécialistes de la communication de l’information par le Web, et qui vise à offrir
des outils permettant une visualisation claire, scientifiquement irréprochable et calibrée pour des publics différents, d’informations historiques, en prenant pleinement en compte leur caractère imparfait.
L’imperfection des données historiques dérive de lacunes documentaires, d’informations contradictoires délivrées par des sources différentes, ou de leur contenu imprécis. Ce caractère incertain
d’une partie des informations, fondamental du point de vue de la compréhension du passé, est actuellement insuffisamment intégré par les outils de visualisation des données, notamment des flux.
Les humanités numériques accompagnent toutes les étapes du projet, en permettant tout d’abord la mise en évidence des caractères aberrants et contradictoires des données par des outils de fouille et la mise en
place de procédures interactives semi-automatisées par lesquelles les chercheurs qualifient la valeur des informations. Tout ce qui sera développé par PORTIC sera sous licence libre.
Ce projet de thèse aborde la question de la qualification de ces données avec une approche combinant à la fois des méthodes symboliques et numériques à travers un processus itératif intégrant les retours
d’experts pour la curation des données du corpus.
Différents aspects seront abordés au cours de ce projet de thèse:
– Un modèle sémantique de trajectoires dérivé d’un modèle spatio-temporel générique (Tran et al. 2016) sera utilisé pour déduire des incohérences dans la base de données (informations contradictoires,
itinéraires incohérents).
– Ce modèle sera connecté à un moteur exécutant des méthodes de fouille de données statistiques non paramétriques et non supervisées pour la détection de patrons récurrents et de valeurs aberrantes.
– Un modèle de qualité sémantique étendra le modèle sémantique actuel pour les trajectoires afin de gérer les annotations qualitatives.
– Les résultats seront affichés dans les interfaces de géo- visualisation de données (développées par ailleurs dans le projet), permettant ainsi aux commentaires de l’expert d’être intégrés dans le modèle sémantique pour une exploration itérative de différentes hypothèses. Cela implique un support pour un raisonnement non monotone en logique formelle de premier ordre.
L’approche sera évaluée tout d’abord en comparant d’anciens ensembles de données brutes avec les mêmes déjà corrigés manuellement, puis avec les données nouvellement collectées dans le projet en faisant en sorte que le logiciel interagisse avec les historiens possédant le rôle d’expert.

Profil du candidat :
Expérience souhaitée en fouille de données (détections de similarités), Statistiques, Web sémantique et
données liées (LOD).

Formation et compétences requises :
Formation : Master 2 spécialité Informatique / Ingénierie des connaissances

Adresse d’emploi :
LIttoral ENvironnement Et Sociétés (LIENSs)
2 Rue Olympe de Gouges, 17000 La Rochelle

Document attaché : projet-these_fr_en_20181217.pdf

Categories: theses

Mar

Sat

2019

Recommandation temps-réel de veille technologique pour profils multicritères évolutifs