MaDICS

Optimisation et analyse interactive de données : le Problème du Voyageur de Données

Sep 10 – Sep 11 all-day

Offre en lien avec l’Action/le Réseau : MADONA/– — –

Laboratoire/Entreprise : LIFAT
Durée : 3 ans
Contact : Patrick.Marcel@univ-tours.fr
Date limite de publication : 2020-09-10

Contexte :
Équipe d’accueil
Laboratoire d’Informatique Fondamentale et Appliquée de Tours (EA 6300 LIFAT) – Equipe Recherche Opérationnelle, Ordonnancement et Transport (ERL CNRS 7002 ROOT) et équipe Bases de Données et Traitement des Langues Naturelles (BDTLN).

L’ERL CNRS Recherche Opérationnelle, Ordonnancement et Transport (ROOT, cf. https://lifat.univ-tours.fr/teams/root/) et l’équipe Bases de Données et Traitement des Langues Naturelles (BDTLN) proposent un financement de thèse de doctorat institutionnelle à temps plein pour un début première quinzaine d’octobre 2020. La thèse sera basée à 50% sur Tours et à 50% sur Blois.
L’équipe ROOT est spécialisée dans les domaines de l’ordonnancement et du transport pour lesquels les outils de la Recherche Opérationnelle sont utilisés. L’équipe BDTLN est spécialisée dans les domaines des bases de données et notamment l’analyse interactive de données.

Sujet :
L’analyse interactive de données est un processus itératif consistant à effectuer une action (par exemple une requête sur des données), recevoir le résultat et décider de l’action suivante à effectuer. L’automatisation de cette tâche rencontre un certain nombre de verrous : comment déterminer parmi la multitude de données le chemin d’analyse à suivre, comment enchainer au mieux les différents types d’actions (requêtes, calcul de modèles, etc.) comment déterminer qu’un résultat est intéressant pour un objectif d’analyse donné, comment raconter, sous forme de narration de données (data storytelling) le résultat d’une analyse, etc.
Le problème qui nous intéresse dans le cadre de cette thèse, est de déterminer un ensemble de requêtes à exécuter en séquence de sorte à maximiser l’intérêt du résultat de ces requêtes par rapport au besoin initial de l’utilisateur. Il est également nécessaire de prendre en compte la durée d’exécution de l’ensemble de ces requêtes de sorte à ce que l’obtention des résultats soit fait dans un temps raisonnable pour l’utilisateur. La problématique soulevée ainsi dans le domaine des bases de données fait ressortir un problème d’optimisation pour lequel les outils de la Recherche Opérationnelle sont pertinents. Une analyse préliminaire fait ressortir une première modélisation de ce problème d’optimisation sous la forme d’un problème de voyageur de commerce (PVC) avec des contraintes particulières :
– les villes du PVC sont les requêtes d’analyse,
– les distances entre villes correspondent au coût cognitif de passer d’une requête à l’autre dans la construction de la narration. Le coût cognitif total (donc la distance totale entre ville) doit être minimisé,
– contrairement au PVC classique :
– il est ici possible de ne pas visiter toutes les villes. Il faudra donc envisager de rejeter des villes (requêtes), faisant ainsi ressortir une problématique de type sac à dos (knapsack). Chaque ville étant dotée d’une valeur numérique représentant le gain espéré vis-à-vis de la tâche d’analyse à réaliser, il faudra donc sélectionner les villes maximisant le gain total,
– chaque ville aura également une durée de visite qui représente la durée d’exécution de la requête. La somme des durées de visite ne doit pas dépasser un budget imparti.
Ce problème d’optimisation est NP-difficile et n’a pas fait l’objet d’études dans la littérature consacrée. Notons que d’autres modélisation pourront être proposées, par exemple, en prenant en compte une contrainte globale sur la diversité des requêtes sélectionnées.

L’objectif de cette thèse sera donc d’étudier et modéliser finement le problème d’optimisation posé, proposer des algorithmes exacts et heuristiques issus de la Recherche Opérationnelle (RO), en les évaluant dans le contexte de l’automatisation d’analyse interactive de données. Nous pourrons envisager, selon le profil du candidat, d’utiliser des techniques de Machine Learning appropriées à l’exploration de données, couplées aux algorithmes d’optimisation issus de la RO.

Profil du candidat :
Le candidat recruté devra avoir de solides connaissances théoriques et pratiques en bases de données, particulièrement sur l’expression et l’optimisation de requêtes. Il devra également maîtriser les outils de la Recherche Opérationnelle (complexité, méthodes exactes et heuristiques, programmation mathématique). Des connaissances en machine learning seront un plus.

Formation et compétences requises :
Master en Informatique.

Adresse d’emploi :
LIFAT, Université de Tours, campus de Tours et campus de Blois.

Document attaché : 202007240828_Offre PVD final-version diffusion BD.pdf

Categories: theses

Sep

Wed

Bridging the gap between stochastic methods and deep learning

Sep 30 all-day

Offre en lien avec l’Action/le Réseau : – — –/– — –

Laboratoire/Entreprise : Univ. Bretagne Sud, LMBA & IRISA Laboratories
Durée : 3 years
Contact : francois.septier@univ-ubs.fr
Date limite de publication : 30/09/2020

Contexte :
The aim of this thesis is to develop theory and methods by successfully combining ideas from both computational statistics and machine learning, thus providing novel stochastic methods to simulate from complex and high dimensional distribution.

The student will be supervised by:
• Lucas Drumetz [link]: lucas.drumetz@imt-atlantique.fr
• Nicolas Courty [link]: nicolas.courty@univ-ubs.fr
• François Septier [link]: francois.septier@univ-ubs.fr

During his/her PhD, the candidate is expected to target top tier machine learning conferences such as ICML, NeurIPS, AISTATS, and build a solid experience at the crossroad between computational statistics and machine learning.

The candidate is requested to firstly send us a CV and a motivation letter to apply for this position before end of May 2020.

Sujet :
Many complex real-world phenomena can be described through probabilistic models that char- acterize available data, and possibly relate them to other unknown quantities of interest. Fields where such systems can be found include environmental science, biology, econometrics, astron- omy, among many others. Unfortunately, for most probabilistic models of practical interest, exact inference is intractable, and so we have to resort to some form of approximation.

Monte Carlo methods are stochastic methods that allow to approximate distributions with a set of random samples [1]. Then, any moments or confidence region with respect to this distribution could be empirically approximated by a discrete sum over these generated samples. Unfortunately, in most cases, sampling directly from the desired distribution is not possible due to its complex nature (multi-modality, high-dimensionality,etc.) or since this distribution is only known up to a normalizing constant (e.g. in Bayesian inference). This has led to the development in recent years of much more advanced algorithms which allow one to obtain the required samples from this target distribution by using either (a) the Markov Chain Monte Carlo (MCMC) methods, which generate a Markov chain whose stationary distribution is the target distribution, or (b) importance sampling (IS) algorithms, where samples are generated from simple proposal densities and are then properly weighted. Despite the existence of theoretical guarantees, convergence speed of such techniques strongly depends on choosing an appropriate proposal distribution, which in practice is quite challenging in practice especially in high-dimensional spaces [2].

On the other hand, deep neural networks have achieved great successes for approximating a deterministic mapping on high-dimensional spaces in a range of different challenging applications related to computer vision, natural language processing, etc. Unfortunately, modern deep learning models used in practice do not capture model uncertainty as they only provide point estimate of parameters and predictions.

The aim of this thesis is to develop theory and methods by successfully combining ideas from both computational statistics and machine learning, thus providing novel stochastic methods to simulate from complex and high dimensional distribution.

We propose to firstly study the current state-of-the-art and more specifically methods that have been recently proposed in the literature, such as Distilling importance sampling [3] or MetFlow [4] for example. These methods use the principle of normalizing flows, a family of generative models proposed in the ML community [5], in order to efficiently design the proposal distribution of classical sampling techniques. The theory of Normalizing Flows buries a lot of similarities with Optimal Transport [6,7], for which the supervising team has already a strong expertise [8–10]. The candidate will explore links between the two, and should propose novel methods at the interface of those domains, such as [11]. Then, we will propose novel strategies to be able to deal with high-dimensional spaces as well as flow across different dimensions. An important aspect that will be covered in this thesis is the proposition of online algorithm for state-space models. Applications will mostly cover environmental sciences, such as pollution tracking, and medical imagery, in collaboration with F. Rousseau (LATIM) [link].

References
[1] C. P. Robert and G. Casella, Monte Carlo statistical methods. Springer, 2004.
[2] F. Septier and G. W. Peters, “Langevin and Hamiltonian Based Sequential MCMC for Efficient Bayesian Filtering in High-Dimensional Spaces,” IEEE Journal of Selected Topics in Signal Processing, vol. 10, no. 2, pp. 312–327, Mar. 2016.
[3] D. Prangle, “Distilling importance sampling,” arXiv.org, Oct. 2019.
[4] A. Thin, N. Kotelevskii, J.-S. Denain, L. Grinsztajn, A. Durmus, M. Panov, and E. Moulines, “MetFlow: A New Efficient Method for Bridging the Gap between Markov Chain Monte Carlo and Variational Inference,” arXiv.org, Feb. 2020.
[5] D. Rezende and S. Mohamed, “Variational Inference with Normalizing Flows,” in Proceedings of the 32nd International Conference on Machine Learning, F. Bach and D. Blei, Eds. Lille, France: PMLR, July 2015, pp. 1530–1538.
[6] F. Santambrogio, “Optimal transport for applied mathematicians.”
[7] G. Peyré, M. Cuturi, et al., “Computational optimal transport,” Foundations and Trends in Machine Learning, vol. 11, no. 5-6, pp. 355–607, 2019.
[8] T. Vayer, R. Flamary, R. Tavenard, L. Chapel, and N. Courty, “Sliced Gromov-Wasserstein,” in NeurIPS 2019 – Thirty-third Conference on Neural Information Processing Systems, vol. 32, Vancouver, Canada, Dec. 2019.
[9] K. Fatras, Y. Zine, R. Flamary, R. Gribonval, and N. Courty, “Learning with minibatch Wasserstein : asymptotic and gradient properties,” in AISTATS 2020 – 23nd International Conference on Artificial Intelligence and Statistics, ser. PMLR, vol. volume 108, Palermo, Italy, June 2020, pp. 1–20.
[10] T. Vayer, L. Chapel, R. Flamary, R. Tavenard, and N. Courty, “Optimal Transport for structured data with application on graphs,” in ICML 2019 – 36th International Conference on Machine Learning, Long Beach, United States, June 2019, pp. 1–16.
[11] L. Ambrogioni, U. Güclü, Y. Güclütürk, and M. van Gerven, “Wasserstein variational gradient descent: From semi- discrete optimal transport to ensemble variational inference,” ArXiv, vol. abs/1811.02827, 2018.

Profil du candidat :
We are looking for a motivated and talented student who should:
• Hold a master’s degree in applied mathematics: probability/statistics, machine learning, data science or signal processing,
• Have a strong backgroung in scientific programming, preferably in Python and deep learning backends such as TensorFlow, JAX or Torch.
• Have English skills allowing scientific communication (oral/reading/writing).

Formation et compétences requises :
see above

Adresse d’emploi :
Campus de Tohannic, rue André Lwoff, 56000 Vannes, FRANCE

Document attaché : 202005041513_PhD_DynaLearn_UBS.pdf

Categories: theses

Statistical learning and random forests for spatio-temporal data

Sep 30 all-day

Offre en lien avec l’Action/le Réseau : – — –/– — –

Laboratoire/Entreprise : Université Bretagne Sud, LMBA UMR CNRS 6205
Durée : 3 years
Contact : francois.septier@univ-ubs.fr
Date limite de publication : 30/09/2020

Contexte :
In this project, we will focus on the study of the random forest algorithm in the context of wireless sensor networks data. The aim of the PhD is therefore to propose rigorous and efficient random forests methods for spatio-temporal data. These new algorithms will be more especially developed to handle WSN data.

The student will be supervised by:
• Audrey Poterie [link]: audrey.poterie@univ-ubs.fr
• François Septier [link]: francois.septier@univ-ubs.fr

The candidate is requested to firstly send us a CV and a motivation letter to apply for this position before end of May 2020.

A fully funded PhD position (three-year contract) is available from September/October 2020 at the Université Bretagne Sud located at Campus Tohannic in Vannes [link]. The student will enjoy an international and creative environment where research seminars and reading groups take place very often. This project will also benefit from strong collaborations with Dr. Ido Nevat [link], senior researcher at TUMCreate (Singapour). Indeed, the methods developed in this project could be assessed and validated on some data from the project Cooling Singapore [link] whose Dr. Ido NEVAT is one of the leaders.

Sujet :
During the past decades, wireless sensor networks (WSN) have attracted considerable attention due to the large number of applications in various fields, such as environmental monitoring, weather, health care and fire detection. In addition, WSN technology has been identified as one of the key components in designing future Internet of things (IoT) platforms. A WSN typically consists of a set of spatially distributed sensors that have generally limited resources, such as energy and memory. These sensors monitor a spatio-temporal phenomenon of interest that contains some desired attributes (e.g. wind speed, seismic activity, temperature, concentrations of substance, etc.).

In a centralized setting (the “ideal” situation), the sensors are assumed to be able to communicate regularly their observations to a base station (BS). The BS collects all these observations and fuses them in order to detect, predict or reconstruct the signal of interest, based on which effective management actions are made. Unfortunately, in practice, owing to the inherently resource constraints of the sensors (e.g. power, connectivity), the inference task has to be performed in a decentralized manner which requires sensor nodes to communicate only with their one-hop neighbors. Furthermore, in very large WSNs, using centralized sensor communication is often not possible. Since the rise of WSNs, many algorithms have been developed to improve the accuracy of such a constrained network to solve the challenging task of interest. Nowadays, these algorithms have seen increasingly intensive adoption of advanced machine learning (ML) techniques such as neural networks or decision trees, see [1] for a survey.

In this project, we will focus more especially on the study of the random forest algorithm in the context of WSN data. The aim of the PhD is therefore to propose rigorous and efficient random forests methods for spatio-temporal data. These new algorithms will be more especially developed to handle WSN data.

Random forest (RF), originally proposed by [2], is part of the most successful statistical methods currently used to handle problems in supervised statistical learning. The popularity of RF can be mainly explained by the fact that it is easy to implement and the method can be applied to a wide range of applications in various fields such as for example medicine [3,4] and ecology [5]. Although some applications on times series [6] and spatio-temporal data [7] could be found and a variant of RF have been recently proposed for time series [8], RF does not in essence take account of the space-time dependent structure of the data.

So using RF to deal with WSN data remains quite challenging and some of the main issues are:
1 As mentioned previously, by assuming that data are independent and identically distributed, RF does not integrate the space-time dependent structure of the data.
2 RF, as most of the ML models, does not need rigid statistical assumptions about the data contrary to parametric models. However, compared with a parametric approach, these methods generally require larger datasets which could be complicated to obtain in real-life scenarios, especially in the decentralized setting when we only observations from a very small number of sensors.
3 The resource constraints of each sensor imply a trade-off between the model accuracy and its computational cost.
4 RF fails to make prediction beyond the range in the training data (extrapolation). When dealing with WSN data, extrapolation methods are frequently used to address lots of problems such as for instance the search for the optimal position of a new sensor or the efficient prediction of a phenomenon of interest not only at the locations of the actual sensors but at all locations.

We propose firstly to explore the current state-of-art work of ML methods, especially RF, in the context of data with a space-time dependent structure, and next to develop new RF approaches for WSN data. Methods commonly used to make inference with WSN data, as for instance the methods involving gaussian processes [9], will be also studied. Then novel techniques integrating both these methods and RF could be also proposed in order to overcome some limitations of the gaussian process methods when dealing with WSN data. First of all, the PhD thesis will be focused on centralized WSN. Next, the context of networks with sensors that communicate in a decentralized way will be addressed and the methods introduced previously for centralized WSN could be extended to this more challenging situation. Extensive simulation studies and applications on real WSN data will be performed in order to assess the performances of each proposed approach.

References
[1] D. P. Kumar, T. Amgoth, and C. S. R. Annavarapu, “Machine learning algorithms for wireless sensor networks: A survey,” Information Fusion, vol. 49, pp. 1–25, 2019.
[2] L. Breiman, “Random forests,” Machine learning, vol. 45, no. 1, pp. 5–32, 2001.
[3] A. Poterie, J. Dupuy, V. Monbet, and L. Rouvi`ere, “Classification tree algorithm for grouped variables,” Computational Statistics, vol. 34, p. 1613–1648, 2019.
[4] R. Diaz-Uriarte and S. A. De Andres, “Gene selection and classification of microarray data using random forest,” BMC bioinformatics, vol. 7, no. 1, pp. 1–3, 2006.
[5] D. R. Cutler, T. C. Edwards Jr, K. H. Beard, A. Cutler, K. T. Hess, J. Gibson, and J. J. Lawler, “Random forests for classification in ecology,” Ecology, vol. 88, no. 11, p. 2783–2792, 2007.
[6] A. Fischer, L. Montuelle, M. Mougeot, and D. Picard, “Statistical learning for wind power: A modeling and stability study towards forecasting,” Wind Energy, vol. 20, no. 12, p. 2037–2047, 2017.
[7] S.Georganos, T. Grippa, A. Niang Gadiaga, C. Linard, M. Lennert, S. Vanhuysse, N. Mboga, E. Wolff, and S. Kalogirou, “Geographical random forests: a spatial extension of the random forest algorithm to address spatial heterogeneity in remote sensing and population modelling,” Geocarto International, vol. 7, no. 1, pp. 1–16, 2019.
[8] P. Joslin, “Prévision multi-échelle par agrégation de forêts aléatoires. application à la consommation électrique.” Ph.D. dissertation, Thèse de doctorat de Mathématiques appliquées, Université Paris Saclay, 2019.
[9] P. Zhang, I. Nevat, G. W. Peters, F. Septier, and M. A. Osborne, “Spatial field reconstruction and sensor selection in heterogeneous sensor networks with stochastic energy harvesting,” Geocarto International, vol. 66, no. 9, p. 2245–2257,2018.

Profil du candidat :
We are looking for a motivated and talented student who should:
• Hold a master’s degree in applied mathematics: probability/statistics, machine learning, data science or signal processing,
• Have a strong backgroung in scientific programming, preferably in R and/or Python.
• Have English skills allowing scientific communication (oral/reading/writing).

Formation et compétences requises :
see above

Adresse d’emploi :
Campus de Tohannic, rue André Lwoff, 56000 VANNES, France

Document attaché : 202005041516_PhD_RandomForest_UBS.pdf

Categories: theses

Oct

CRÉATION D’UN MODÈLE Statistique DE PRÉDICTION DE POSITION D’UN AVION

Oct 29 all-day

Offre en lien avec l’Action/le Réseau : – — –/– — –

Laboratoire/Entreprise : Laboratoire de Recherche de l’ENAC
Durée : 3 ans
Contact : laurent.lapasset@recherche.enac.fr
Date limite de publication : 2021-01-01

Contexte :
Mots clefs : Analyse / traitement d’images et de séries temporelles, IA, Machine Learning et Deep Learning

Date de début de la thèse : rentrée 2020

Type de financement : Thèse CIFRE

Contact :
Thierry Klein
Laurent Lapasset

Localisation :
ENAC, 7, Avenue Edouard Belin, 31400 Toulouse

L’un des objectifs principaux de la DSNA (Direction des services de la Navigation aérienne) est la modernisation de la navigation aérienne.
Le plan de modernisation technique prend en compte le programme de recherche européen “SESAR”, volet technologique du projet de ” ciel unique européen “, et en particulier le cadre réglementaire défini par le ” Pilot Common Project ” première étape de déploiement des travaux dès 2015.
Il s’appuie sur des outils de nouvelle génération (4-Flight, ERATO…) et des procédures opérationnelles optimisées pour toutes les phases du vol (EGNOS, Airport CDM…) et plus particulièrement le projet SALTO en lien avec la présente thèse.

Le projet SALTO (Swift ATFCM/ASM Local Trac Optimizer) est un projet dont la DGAC est maitre d’ouvrage et Capgemini en est le maitre d’ oeuvre.

SALTO permet de surveiller les secteurs de contrôles d’un CRNA.
C’est un outil adapté aux besoins des FMP (Flow Management Position). L’outil SALTO répond aux besoins des activités lors des phases stratégiques, pré-tactique, tactique et post-analyse.

Sujet :
Nous allons récupérer les trajectoires réelles enregistrées par le projet SALTO. Ce type de données spatio-temporelles soulève un ensemble de questions classiques en fouille de données mais plus originales lorsqu’elles sont regroupées pour constituer des trajectoires. Peut-on dénir une trajectoire
type ? Peut-on détecter des trajectoires anormales ? Comment regrouper les trajectoires similaires ?
Le principe d’une telle étude repose sur la création d’une distance adaptée aux propriétés particulières que présentent les données de trajectoires. Les trajectoires sont des suites de points du plan indexes par le temps. Pour pouvoir les comparer il faut prendre en compte non seulement le point de départ et le point d’arrivée, qui dénissent l’itinéraire, mais également leur longueur et leur forme ainsi que que la fenêtre temporelle associée.
De nombreuses distances ont déjà été développées en vue de cet objectif. Dans tous les cas les distances que nous utiliserons seront basées sur les critères suivants: la distance physique entre deux trajectoires, la forme des trajectoires (orientation, longueur) et la “dépendance temporelle”.

Profil du candidat :
Formation M2 / Ingénieur

Formation et compétences requises :
Informatique, Image, Série Temporelle, Statistiques, IA

Adresse d’emploi :
7, Avenue Edouard Belin, 31400 Toulouse

Document attaché : 202008131017_Cap-Sujet-developpe.pdf

Categories: theses

Oct

Sat

Optimization of activity pattern detection on ethical and responsible digital traces

Oct 31 – Nov 1 all-day

Offre en lien avec l’Action/le Réseau : – — –/Doctorants

Laboratoire/Entreprise : DVRC (ALDV) / Cedric (Cnam) / Kwanko
Durée : 36 mois
Contact : nicolas.travers@devinci.fr
Date limite de publication : 2020-10-31

Contexte :
The world of digital marketing through Real Time Bidding (RTB) is based on tracking and analyzing user behavior on the web. With users tracking on the web via their interactions on web browsers, smartphones, emails or advertisements, RTB seeks to maximize the dissemination of information on the Web. Thus, by following the digital journey of the user, the RTB adapts the advertising campaigns to their profiles. With the arrival of the GDPR (General Data Protection Regulation), data confidentiality becomes a major issue for companies working in e-commerce. Having to manage user profiles and preserve their privacy adds complexity that goes against the principle of traditional user tracking [1,2,3]. It then becomes necessary to define user profiles adapted to the new standards and thus produce ethical tracking.
In addition, the traffic generated by the RTB process has a huge impact on the consumption of resources, both on the network, the computing servers, and in the user’s environment. Recently, the theme of responsible marketing or ecodesign of media supports (Green Design) is emerging [4,5], but the RTB field is still slow to evolve on these modular conceptions of the tracking and analysis process.
Thus, the possibility of marrying RTB with Green Design then becomes a strong argument in an advertising campaign. Kwanko seeks to meet these two challenges by adapting their RTB processes. Founded in 2003, Kwanko is a major player in performance digital advertising on the Web, mobile and tablets. Its purpose is to support advertisers in the context of traceability and maximizing the impact of their advertising campaigns. Kwanko makes it easier for brands to connect with their audiences on the web. The problem addressed in this research subject is multiple. In the context of maximizing the impact of an RTB campaign, we must both preserve the possibility of tracking users to maximize the “transformation” (optimal tracking), minimize the energy impact of the analysis process and tracking (responsible tracking) and maximizing the protection of the user’s privacy (ethical tracking). This problem combines opposite dimensions implying a problem of multi-stress maximization.

This doctoral thesis will be funded by a CIFRE contract with Kwanko, in partnership with the DVRC laboratory of the Leonardo da Vinci Association (Paris La Défense) within the digital group, supervised by Nicolas Travers (HDR) and Cédric du Mouza (HDR).

Sujet :
Towards responsible digital traces
In the first part, we plan to redefine the tracking and analysis process in modular microcomponents [3]. The idea is to dissociate personal data from the analysis by producing an adaptive data model that will serve as a common model for the analysis steps. The separation into microcomponents makes it possible to quantify the energy impact of each component and thus to optimize it to reduce the cost. First, the complexity of the processing performed in each component associated with the amount of data to be processed (depending on the user profile) gives the cost of each step of the analysis. The combination of microcomponents based on unit operations produces an algebraic expression whose operations are interchangeable for optimization. The overall complexity of the algebraic expression thus gives the energy impact of the RTB analysis. To reduce the energy impact, an initial heuristic will try to allocate the task to the optimal location to reduce the overall impact, either by pooling multi-campaign calculations, or by pooling user profiles. The relevance of a campaign with the user profile can be calculated both on the browser and on the server.

Towards ethical digital traces
In the second step, we will rely on the common data model that will be used in the tracking process to preserve the user profile. The aim is to reduce the dependence of classical analysis models on user profiles, amplified by the tendency to block these trackers [6]. Thus, we will be able to manage the cursor between the precision of the analysis according to the users’ adherence to profiling, tending towards an ethical tracking. Like visual tracking techniques [7], privacy preservation strategies are based on the definition of activity patterns for the detection of specific patterns (Activity Pattern Detection). It is possible to orient our data model in the form of Activity Pattern for RTB. The profile will be analyzed in the user area to generate local detections based on a dedicated campaign. The result then produces a recommendation to target the user with relevant advertising while maximizing privacy protection. Another option being considered is to use techniques to define a multidimensional targeting model for campaigns and to place the user in it. In order to guarantee its anonymization, we will move towards random allocation techniques with probabilistic guarantee as used in the secure allocation of requests preserving privacy [8]. This approach will allow the user profile to be projected onto campaign profiles and to target the user without knowing the user.

Towards an optimal digital trace calculation
The cost model based on energy impact will therefore be based on the complexity of the components, their combination for analysis, the level of privacy protection, the amount of data available, and the level of precision expected at output. Multi-criteria optimization is therefore necessary to guide the choice of the analysis solution suitable for a set of advertising campaigns. The idea for Kwanko is to offer a service that can be adapted to their client by trying to respond to different dimensions of tracking that are hardly compatible: ethical, responsible and optimal. The customer will be able to accentuate a dimension according to the impact he wishes to have in his campaign.

Profil du candidat :
holder of Master in IT, with solid knowledge in data distribution, pattern mining, possibly secure data processing, but also a strong experience in development is recommended

Formation et compétences requises :
BAC+5 Computer Science – DBMS / distributed systems / IS

Adresse d’emploi :
Kwanko 60 BD DU MARECHAL JOFFRE 92340 BOURG-LA-REINE
DVRC Pôle Universitaire Léonard de Vinci 92 916 Paris La Défense Cedex

Document attaché : 202009081502_PHD_Kwanko.docx

Categories: theses

Nov

Mon

Offre de thèse CIFRE : Extraction d’entités et de relations dans le domaine scientifique

Nov 30 – Dec 1 all-day

Offre en lien avec l’Action/le Réseau : – — –/– — –

Laboratoire/Entreprise : Laboratoire d’Informatique de Paris Nord (LIPN)
Durée : 3 ans
Contact : tomeh@lipn.fr
Date limite de publication : 2020-11-30

Contexte :
Laboratoire :
Le/la doctorant.e intégrera l’équipe de Représentations des Connaissances et Langage Naturel (RCLN; https://lipn.univ-paris13.fr/accueil/equipe/rcln/) du Laboratoire d’Informatique de Paris Nord (LIPN), UMR CNRS 7030 attaché à l’Université Sorbonne Paris Nord. L’équipe RCLN est membre du laboratoire d’excellence EFL (Empirical Foundations of Linguistics; http://www.labex-efl.com).

Société :
Un groupe international de conseil dans le domaine de la Recherche, du Développement et de l’Innovation, intervenant sur les aspects organisationnels, structurels, méthodologiques, scientifiques et financiers. La société conseille les entreprises les plus à la pointe dans leurs secteurs, celles qui innovent et fondent leur avance stratégique sur des travaux de recherche expérimentaux et fondamentaux, avec une expérience de plus de 20 ans et plusieurs milliers de collaborations à travers le monde, sur tous les continents. Au sein du groupe, la Direction Scientifique a pour mission de coordonner l’ensemble des actions scientifiques, tant sur les plans opérationnels, que méthodologiques et conceptuels. Elle s’appuie, en France, sur les ressources de plus de 60 docteurs de toutes disciplines. Au cœur de cette Direction Scientifique, le doctorant intégrera l’équipe du Research Lab, le département R&D interne du groupe basé à la Défense à Paris.

Sujet :
Le/la doctorant.e travaillera sur la conception et la mise en oeuvre d’un système d’extraction jointe d’entités et de relations sémantiques à partir de textes écrits par des experts dans des différents domaines techniques.

Sujet détaillé :
https://lipn.univ-paris13.fr/~tomeh/public/uploads/offers/phd-cifre-relation-extraction.pdf

Mots clés :
Traitement automatique des langues (natural language processing); Apprentissage profond (deep learning); Extraction d’entités et de relations (entity and relation extraction); Analyse de dépendances syntaxiques (dependency parsing); Apprentissage automatique (machine learning); Representations des connaissances (knowledge representation)

Profil du candidat :
–

Formation et compétences requises :
Master 2 (ou équivalent) en informatique ou mathématiques appliqués.
Spécialisation en traitement automatique des langues (TAL) ou en apprentissage automatique (machine learning) ;
Connaissances en réseaux de neurones et apprentissage profond (deep learning) ;
Des connaissances en représentations des connaissances seraient appréciées ;
Bonne maîtrise des langages python et C++ ;
Bon niveau d’anglais ;
Bon niveau de français.

Adresse d’emploi :
La Défense, Paris.

Categories: theses

Research assistant position in Greifswald, Germany

Dec 10 – Dec 11 all-day

Offre en lien avec l’Action/le Réseau : – — –/– — –

Laboratoire/Entreprise : Institut für Mathematik und Informatik, University
Durée : 3 ans
Contact : joscha.diehl@uni-greifswald.de
Date limite de publication : 2020-12-10

Contexte :
This position is part of the international project EDDA between France Germany and Japan, jointly funded by ANR, DFG and JST (https://sites.google.com/view/project-edda/project-edda)
Background on the project
Our international and inter-disciplinary team of researchers from machine
learning, algebra, stochastic analysis, data assimilation and oceanography, aims to
• develop interpretable features of multi-dimensional time-series in a rigorous algebraic framework, based on the iterated-integrals signature, for the analysis of dependence, synchronization and structure
understand how to extract these features in a robust fashion
• develop statistical guarantees for these features in the setting of standard time-series models and benchmark on synthetic data
• use these new – as well as existing – statistical methods to perform original investigation on oceanic and climate data

Sujet :
Prof. J. Diehl is looking to fill the position of a Research Assistant (3 years, 75%, pay group 13 TV-L Wissenschaft)
at the University of Greifswald in the trilateral project EDDA (https://sites.google.com/view/project-edda/project-edda)
This is a pure research position; however, if desired, it is possible to take on teaching responsibilities.
The position is suitable for the preparation of a doctorate.

Profil du candidat :
Job requirements
• Advanced university degree in mathematics, statistics, physics or computer science (Master or
equivalent)
• Fluency in English
• Enthusiasm for data science from a theoretical perspective
• Interest in working with real-life data
• Programming experience is desirable
• Knowledge of algebra (e. g. non-commutative, commutative, representation theory, algebraic
topology) and statistics is favourable

Formation et compétences requises :
Job requirements
• Advanced university degree in mathematics, statistics, physics or computer science (Master or
equivalent)
• Fluency in English
• Enthusiasm for data science from a theoretical perspective
• Interest in working with real-life data
• Programming experience is desirable
• Knowledge of algebra (e. g. non-commutative, commutative, representation theory, algebraic
topology) and statistics is favourable

Adresse d’emploi :
Universität Greifswald, Institut f. Mathematik u. Informatik, 17487 Greifswald (Germany)

Document attaché : 202011111424_phd-position-greifswalds-2020.pdf

Categories: theses

Fri

Intent Based Networking: Cross-Layer Modeling and Signaling

Dec 18 – Dec 19 all-day

Offre en lien avec l’Action/le Réseau : – — –/– — –

Laboratoire/Entreprise : CNAM, CEDRIC
Durée : 3 years
Contact : elena.kornyshova@cnam.fr
Date limite de publication : 2020-12-18

Contexte :
Research laboratory: Computer Science and Communications department (CEDRIC; https://cedric.cnam.fr)
Research team: Networks and IoT Systems (ROC – Réseaux et Objets Connectés; https://roc.cnam.fr)

Related collaborative projects:
* H2020 AI@EDGE: European project on AI for beyond-5G networks, with 19 European partners.
* ANR INTELLIGENTSIA: national project on network automation with Orange, Inria, Acklio, Aguila.

Period: The 3-year contract would start on January 2021, but the beginning can be delayed by few months.

Salary: Appr. 24 000 € gross/year – 1 700 € net/month (before “prélèvement à la source”). In addition, 50% of the public transportation subscription can be reimbursed. Optional: teaching activities in French and/or English for up to 64 h/year, 2 650 €/year.

Sujet :
Intent-Based Networking (IBN) is a new paradigm arising in network management. It is driven by the possibility to leverage on network programming capabilities to implement service provisioning “intents”. An IBN solution meets “What to achieve” requirements expressed by users through User-to-Network Interfaces (UNIs); it is meant to support business goals and translate them into policies.
The interaction of an IBN system with a programmable infrastructure happens using Northbound Interfaces (NBIs) at the resource level, but can be the result of a composite intent translation chain from the UNI to many NBIs. For instance, taking the rising software-defined Radio Access Network environment (using ORAN for radio resource scheduling and ONAP for the orchestration layer), IBN can appear with orchestration intents at the ONAP user interface, and at the ORAN resource scheduling level at the near-real time controller, by means of the ORAN NBIs named A1 in the specifications; and as many NBI as resources (link, computing) can be solicited by the orchestration layer, so intents at the orchestration layer have to be deployable at the resource layers with resource-level intents. The standpoint of this project is therefore that the IBN-driven service orchestration is implemented across multiple resource-level NBIs, similarly to information system architectures.
Works at the state of the art declining the IBN framework to edge network infrastructure exist, such as [1] for vehicular applications; software-defined exchanges are defined therein as middleware for inter-layer IBN communications. A similar concept is used in [2,3] to consider the context for the IBN definition, touching several technical components. Techniques to identify and to process intents via context characteristics using artificial intelligence frameworks are studied in [1]. Nonetheless, a great confusion persists on the precise intent definition (different concepts are used to present intents: intentions, objectives, or else requirements) and its linkage with resource-level IBN configuration rules. For instance, major SDN controllers todays (e.g. ONOS, ODL) only very partially develop the IBN capabilities. Only in [2] context characteristics to define intents are clearly stated, such as traffic profile, required network function, device information. In this sense, AI and Machine Learning (ML) can help in defining methods to identify intentions and relevant context characteristics, to map them to an orchestration-level IBN process, then translated to resource-level IBN policies.
In this respect, the PhD project will address the following challenges:
– qualify the notion of intent in IBN, including its cross-layer implications in orchestration and resource-level systems, with a rigorous intent taxonomy that can be customized.
– define the context characteristics that should account by AI/ML processes or human-driven systems toward the definition of intents.
– design a cross-layer IBN framework with signaling requirements from UNI to NBI levels for expressing different network automation flavors (e.g., planning, real-time)
– experimentally show case the IBN signaling framework and its utility in network automation, using existing open networking software platforms.
References:
[1] A. Singh et al., “Intent-Based Network for Data Dissemination in Software-Defined Vehicular Edge Computing”. IEEE Transactions on Intelligent Transportation Systems, 1–9, 2020, early-access.
[2] D. Comer, A. Rastegatnia, “OSDF: An Intent-based Software Defined Network Programming Framework,” 2018 IEEE 43rd Conference on Local Computer Networks (LCN).
[3] J. Pan, McElhannon, “Future Edge Cloud and Edge Computing for Internet of Things Applications”. IEEE Internet of Things Journal 5 (1): 439–49, 2017.

Profil du candidat :
Master’s degree in computer science, computer engineering, or telecommunications engineering.

Formation et compétences requises :
Solid knowledge in Networks, IS, and Machine Learning

Adresse d’emploi :
2, rue Conté, Paris 75003

Categories: theses

Sat

Towards an infrastructure for sourced, reproducible and verifiable knowledge graphs.

Dec 26 – Dec 27 all-day

Offre en lien avec l’Action/le Réseau : – — –/– — –

Laboratoire/Entreprise : CNRS/LIRIS – INSA de Lyon
Durée : 36 months – starting
Contact : Sylvie.Cazalens@insa-lyon.fr
Date limite de publication : 2020-12-26

Contexte :
The thesis will take place in the database team (DB) of the LIRIS laboratory, Campus de la Doua, Lyon-Villeurbanne (liris.cnrs.fr)

It is part of the ANR project DeKaloG (2021-2024) which aims at a general framework to build community, decentralized knowledge graphs according to principles of accessibility and transparency. The project gathers the efforts of three teams: GDD (LS2N, Nantes), Wimmics (Inria, Sophia Antipolis and Université Côte d’Azur) and BD (LIRIS, Lyon).

Sujet :
The general aim of this thesis is to contribute to the DeKaloG framework, with a focus on transparency, and more particularly on reproducibility: knowledge within the gragh (or the graph itself) should be reproducible and verifiable. For facts deduced within the graph, one may rely on works about provenance. However, for facts obtained using external tools and directly introduced into the graph, questions about provenance, reproducibility and verifiability have to be addressed. The following objectives should be targeted:
– Defining requirements for an extensible model of transparency, up to reproducibility. A first step consists in drawing a whole picture of the needs in the context of knowledge graphs, leveraging results in other different related domains such as linked data and semantic web, but also some achievements in other scientific domains (medicine, biology, etc). A second step consists in designing an extensible model of different levels of transparency, that can be queried, consistent with the current semantic web standards.

– Use of the proposed model to enable more transparency in knowledge graphs. This requires to inject more metadata into knowledge graphs, which raises problems of data volume and thus performances. This is a major hindrance for scalability. Recent approaches to this problem provide a starting point. Additionally, linked data and workflows can be intertwined to push transparency up to reproducibility.

– Estimating/verifying the transparency degree of a KG. One should be able to obtain information qualifying and quantifying the transparency degree of a knowledge graph she wants to use. This is also very important when building an index of knowledge graphs.

Hence, this thesis should result in an infrastructure enabling to link a knowledge graph with external solutions, accessed through services, for KGs and facts to be reproducible and verifiable by anyone.

Profil du candidat :
Applicants should have both theoretical and applied skills in computer science, in particular, a good knowledge of semantic web/knowledge graphs foundations and a good practice of associated tools. A background in the domain of workflows would appreciated.

Formation et compétences requises :
Any diploma equivalent to a french “master en informatique”

Adresse d’emploi :
LIRIS – UMR 5205 CNRS
Bâtiment Blaise Pascal – INSA Lyon
7 avenue Jean Capelle, 69100 Villeurbanne
France

Document attaché : 202010081715_ThesisReproducibleKG.pdf

Categories: theses

Deep learning and Geophysical Extremes

Dec 31 2020 – Jan 1 2021 all-day

Offre en lien avec l’Action/le Réseau : MACLEAN/– — –

Laboratoire/Entreprise : LSCE/Lab-STICC
Durée : 36 mois
Contact : ronan.fablet@imt-atlantique.fr
Date limite de publication : 2020-12-31

Contexte :
Extremes are crucial features of geophysical processes and can play a fundamental role in terms of societal impacts, e.g. major floods. By definition, extreme events are rare, but they happen and records are made to be beaten. In terms of machine learning algorithms, it is difficult to learn from very few examples even in a large learning database. In addition, the probability distribution of extreme events cannot be well captured by measures based solely
on deviations from the mean. These two issues clearly challenge the classic learning paradigm. From an uncertainty point of view, there exists a probability theory tailored to model extremal behavior, the so-called Extreme Value Theory
(EVT).

Sujet :
The main task of the PhD student will be to build bridges between physics-informed neural networks (NN), and multivariate EVT used in environmental statistics. A major bottleneck to couple both NN and EVT techniques is the question of metrics for rare events, and how to assess predictive distributions from forecast models. This two aspects will be studied in detail during the PhD.

This PhD will be part of the ANR Melody. This implies that the main application domain will be the field of ocean dynamics and consequently, all algorithms will be tested on low–dimensional toy models (Lorenz models) or intermediate size models (1D-Burgers equations, 2D-QG models, etc).

The PhD will be co-supervised by Dr. P. Naveau (CNRS, LSCE), Dr. A. Sabourin (IMT, LTCI) and Dr. R. Fablet (IMT, Lab-STICC). The PhD could take place in Paris and/or Brest.

Profil du candidat :
MSc. and/or Engineer degree in Data science, statistical learning and/or geosciences (ocean dynamics)

Formation et compétences requises :
Programming skills (Python)
Machine Learning and Deep learning skills (e.g., scikit-learn, pytorch, keras…)

Adresse d’emploi :
LSCE (Paris, Saclay) and/or IMT Atlantique (Brest)

Categories: theses

Jan

Fri

Thèse financée [Univ Paris / Philips] : CBIR et radiologie

Jan 1 – Jan 2 all-day

Offre en lien avec l’Action/le Réseau : – — –/– — –

Laboratoire/Entreprise : LIPADE, Univ Paris
Durée : 3 ans
Contact : camille.kurtz@u-paris.fr
Date limite de publication : 2021-01-01

Contexte :
Mots clefs : Analyse / traitement d’images médicales, IA, Deep Learning, recherche d’images similaires par le contenu, sémantique, indexation d’images, base hospitalière d’images radiologiques – PACS

Lien vers le sujet détaillé :

http://w3.mi.parisdescartes.fr/sip-lab/files/iaPACS_sujetDeDoctorat_Informatique_IA_AnalyseImagesMedicales.pdf

Date de début de la thèse : rentrée 2020

Type de financement : région IdF

Contact :
Camille KURTZ
Florence CLOPPET

Localisation :
LIPADE (Laboratoire d’Informatique de Paris Descartes), Université de Paris, 45 rue des saints-pères 75006 paris
Philips, 33 Rue de Verdun, 92156 Suresnes

Sujet :
Ce projet doctoral porte sur l’intégration d’un moteur de recherche d’images par le contenu dans une base d’images médicales pour apporter une aide aux médecins radiologues dans l’interprétation d’images en routine clinique et la prise de diagnostic. Le sujet est en collaboration entre le laboratoire d’Informatique LIPADE de l’Université de Paris , Philips Healthcare, leader dans le développement et la commercialisation d’appareil d’acquisition d’images et l’HEGP (Hôpital européen Georges-Pompidou). D’un point de vue académique, le caractère innovant repose sur le développement de nouvelles approches d’indexation et de recherche d’images similaires via des descripteurs visuels issus de réseaux de neurones profonds (convolutionnels) et leur couplage / interaction avec des descripteurs sémantiques de haut-niveau employés par les radiologues. D’un point de vue industriel, ce projet représente une rupture technologique avec l’existant étant donné qu’actuellement, les systèmes d’information et de gestion d’images radiologiques intégrés dans les hôpitaux en routines ne peuvent être interrogés que par « mot-clés » et la fouille d’images par le contenu n’est pas exploitable malgré la masse d’images préalablement diagnostiquées et interprétées disponibles.

Profil du candidat :
Formation M2 / Ingénieur

Formation et compétences requises :
Informatique, Image, IA

Adresse d’emploi :
http://w3.mi.parisdescartes.fr/sip-lab/files/iaPACS_sujetDeDoctorat_Informatique_IA_AnalyseImagesMedicales.pdf

Categories: theses

Mon

Predictive Query Optimization for Multi-tenant Cloud DBMSs

Mar 15 all-day

Offre en lien avec l’Action/le Réseau : – — –/Doctorants

Laboratoire/Entreprise : IRIT Institut de Recherche en Informatique de Toul
Durée : 3 ans
Contact : hameurlain@irit.fr
Date limite de publication : 2021-03-31

Contexte :
In parallel and distributed large-scale environments (Cluster, Grid, Cloud), the Pyramid team addresses the main problems of query processing and optimization, targeting large volumes of data distributed in large scale. In cloud environments, users are often called tenants. A cloud DBMS shared by many tenants is called a multi-tenant DBMS. The resource consolidation in such a DBMS allows the tenants to only pay for the resources that they consume, while providing the opportunity for the provider to increase its economic gain. For this, a Service Level Agreement (SLA) is usually established between the provider and a tenant. However, in the current systems, the SLA is often defined by the provider, while the tenant should agree with it before using the service. In addition, only the availability objective is described in the SLA, but not the performance objective. In one of our previous work [8], an SLA negotiation framework was proposed for OLAP applications, in which the provider and the tenant define the performance objective together in a fair way. To demonstrate the feasibility and the advantage of this framework, we evaluated its impact on query optimization. We formally defined the problem by including the cost-efficiency aspect, we designed a cost model and improved two execution plan search methods to adapt to the new context, and we proposed a heuristic to solve the resource contention problem caused by concurrent queries of multiple tenants. We also conducted a performance evaluation to show that, our optimization approach (i.e., driven by the SLA) can be much more cost-effective than the traditional approach which always minimizes the query completion time.

Sujet :
In the above work, we proposed a new criterion: the Unit Benefit Factor (UBF) which is the profit generated in a unit of time (by the execution of a query). For example, if a query lasts 2 seconds and it allows the provider to have 10 cents of profit, the UBF is then 5 cents / second. For each given query, the optimizer chooses the execution plan that maximizes this criterion. Obviously, this does not guarantee the maximum profit when considering all the queries of all tenants in a long term. Indeed, the workload of a multi-tenant DBMS varies over time and influences both the QoS (tenant side) and the economic cost (provider side). Some work proposes to build models in order to predict the future load [2, 4, 6, 9]. This prediction can help the optimizer to choose execution plans that improve both QoS and profitability in a long term [1]. Taking into account this prediction (that becomes a new constraint) requires extending the cost model and revisiting the search strategy.
In this perspective, the candidate is expected to design and develop a query optimization method by taking into account the workload prediction. More precisely, she/he will: (i) study the related work [e.g., 2-9], (ii) propose a predictive query optimization method that maximizes the provider’s long term profit while meeting the SLAs established with the tenants, and (iii) conduct an experimental study to evaluate and validate the proposed method.

References

[1] Abadi, D., et al. ; The Seattle Report on Database Research; SIGMOD Record, December 2019, Vol. 48, No. 4.
[2] Picado, J., Lang W., Thayer E.C.; Survivability of Cloud Databases – Factors and Prediction. SIGMOD ’18: Proceedings of the 2018 International Conference on Management of Data. May 2018, p. 811-823.
[3] Pietri, I., Chronis, Y., and Ioannidis, Y.; Fairness in Dataflow Scheduling in the Cloud. Information Systems, Elsevier, Vol. 83, 2019, p. 118-125.
[4] Taft, R., El-Sayed, N., Serafini, M. , Lu, Y., Aboulnaga, A.I., Stonebraker, M., Mayerhofer, R., and Andrade, F. ; P-Store: An Elastic Database System with Predictive Provisioning. SIGMOD ’18: Proceedings of the 2018 International Conference on Management of Data, May 2018, Pages 205-219.
[5] Tan, Z., and Babu, S. Tempo: robust and self-tuning resource management in multi-tenant parallel databases. Proceedings of the VLDB Endowment 9.10, 2016, p. 720-731.
[6] Viswanathan, L., Chandra, B., Lang, W., Ramachandra, K., Patel, JM., Kalhan, A., DeWitt, D. J., and Halverson, A.; Predictive Provisioning: Efficiently Anticipating Usage in Azure SQL Database. IEEE 33rd International Conference on Data Engineering (ICDE), 2017, p. 1111-1116.
[7] Wong, P., He, Z., Feng, Z., Xu, W., and Lo, E.; Thrifty: Offering Parallel Database as a Service using the Shared-Process Approach. SIGMOD Conference 2015, p. 1063-1068.
[8] Yin, S., Hameurlain, A., and Morvan, F.; SLA Definition for Multi-tenant DBMS and its Impact on Query Optimization. IEEE Transactions on Knowledge and Data Engineering, Vol. 30, N. 11, 2018, p. 2213-2226.
[9] Zhang, W., Zheng, N., Chen, Q., Yang, Y., Song, Z., Ma,T., Leng, J., and Guo, M.; URSA: Precise Capacity Planning and Fair Scheduling based on Low-level Statistics for Public Clouds. ICPP ’20: 49th International Conference on Parallel Processing – ICPP. August 2020, p.1- 11.

Profil du candidat :
Master 2 in Computer Science:
– Data & Knowledge Management Systems
– Distributed and Parallel Systems

Formation et compétences requises :
Master 2 in Computer Science with the following requirements:

Distributed and Parallel Systems, Data Management Systems, Database Systems, Query Processing and Optimization, Cost Models, Cloud Systems, Programming Languages (e.g. C++, Java, Python).
The Application should include following documents (PDF format, see: http://www.edmitt.ups-tlse.fr/):
1- CV mentioning all your degrees
2- Motivation letter from the applicant explaining his/her choice of the proposed thesis subject
3- Recommendation letters
4- Details of your grades since you started higher education with ranking.

Applications in digital form (pdf) should be sent to: hameurlain@irit.fr
Application Deadline: March 14th, 2021
Start Date: October 1st, 2021.

Adresse d’emploi :
University: Paul Sabatier University, Toulouse 3
Research Laboratory: IRIT Institut de Recherche en Informatique de Toulouse
Team: PYRAMID (Dynamic Query Optimization in Large-scale Distributed Environments; https://www.irit.fr/PYRAMIDE/)
118, route de Narbonne
F-31062 TOULOUSE Cedex
FRANCE

Document attaché : 202102261046_PhD Subject 2021_Predictive_Query_Optimization.pdf

Categories: theses

Sat

Self-regularised deep learning in the presence of limited data for medical imaging

Mar 20 @ 09:20 – 10:20

Offre en lien avec l’Action/le Réseau : – — –/– — –

Laboratoire/Entreprise : ICube, University of Strasbourg
Durée : 3 years
Contact : lampert@unistra.fr
Date limite de publication : 20/3/2020

Contexte :
The adoption of deep learning techniques in medical imaging applications has been limited by the availability of the large labelled datasets required for robust training, as well as the difficulty of explaining their decisions. This thesis will make contributions towards overcoming both of these limitations.

Sujet :
It will achieve this by developing approaches to learn more robust representations using explainability. These approaches will be referred to as self-regularised deep learning in the presence of limited data. The problem of domain adaptation and learning domain invariant representations in histopathological whole slide segmentation will be taken as the initial focus of this study, but this is open be expanded during the project. Current approaches fail to achieve domain invariance because of the large domain shifts between histochemical and immunohistochemistry stainings.

An initial research direction will be to develop novel training mechanisms that are aware of, and therefore avoid, situations in which the network focusses only on limited parts of the salient information (as defined by the expert through few manual annotations) will be developed. These will force a more general representation to be learnt. The benefit being threefold: the model will be more generalisable, more domain invariant, and more amenable to transfer learning.

Profil du candidat :
The position is open to both foreign and French students who hold a Master’s degree in Computer Science. French is not necessary, but the candidate must be confident in spoken and written English.

Formation et compétences requises :
The candidate must have a good mathematical background, skills in machine learning (supervised and/or unsupervised). Experience in deep learning and representation learning would be a plus.

Adresse d’emploi :
ICube UMR 7357 – Laboratoire des sciences de l’ingénieur, de l’informatique et de l’imagerie
300 bd Sébastien Brant – CS 10413 – F-67412 Illkirch Cedex

Document attaché : 202004301542_PhD_advert.pdf

Categories: theses

Tue

Offre de thèse AI & Process Mining

Mar 30 – Mar 31 all-day

Offre en lien avec l’Action/le Réseau : – — –/– — –

Laboratoire/Entreprise : LAMSADE, Université Paris-Dauphine-PSL
Durée : 3 ans
Contact : daniela.grigori@dauphine.fr
Date limite de publication : 2021-03-30

Contexte :
Process mining is a recent research topic that applies artificial intelligence and data mining techniques to process modelling and analysis [1,2]. The main idea is to extract knowledge from events recorded in an events log in order to discover, monitor and improve processes. Event logs stores activities related to process instances, as well as additional information such as the resources executing the activities, data produced or used, timestamps, or costs.

Process mining approaches allow the discovery of the process model or its variants (a.k.a. discovery), the detection of deviations between the real process and the designed model (a.k.a. conformance checking), and the improvement of the process model based on the observed events (a.k.a. enhancement). Predictive process monitoring is a subfield of process mining that deals with predicting outcome for running instances [3,4].

Most existing process mining and process monitoring approaches consider the process to be in steady state and so do not consider the context in which the process takes place nor the changes that may affect it while being analyzed [5,6]. Information about the context could be derived from the process log (resource occupation rate…) or captured from other sources of information that could enrich the log. Dealing with context information is important to detect and analyze changes [7,8] and is one of the challenges for research described in the Process Mining Manifesto [9].

Sujet :
The aim of this thesis is to consider the context in all the phases of the process improvement life cycle (discovery, conformance checking, enhancement) as well as in process monitoring. Including the context could improve the precision of the discovered process model and of its analysis enabling better recommendation for process improvement and better predictions for process monitoring.

It will also allow to address fairness issues (e.g., not blame an overloaded resource for delays) and conduct causality analysis (e.g., which factor or context variable causes delays).

Towards a context-enhanced analysis of process-centric data, the following objectives should be addressed:
– Propose context-driven process discovery and conformance checking techniques
– Use context attributes to propose meaningful improvements
– Study what context attributes to monitor and how to identify when these attributes change 
– Propose approaches to detect context changes online 
– Propose predictive approaches with online learning to make sure that the process model is to up to date

Profil du candidat :
We seek for excellent and highly motivated student with a background in Computer Science.

Formation et compétences requises :
Master in Computer Science
Required skills : ML and graphs knowledge, programming skills

Adresse d’emploi :
https://www.lamsade.dauphine.fr/fileadmin/mediatheque/lamsade/documents/propositions_theses_2020/grigori.pdf

Categories: theses

Wed

Diagonalisation conjointe et de décompositions tensorielles pour la détection précoce d’Alzeihmer

Mar 31 – Apr 1 all-day

Offre en lien avec l’Action/le Réseau : – — –/Doctorants

Laboratoire/Entreprise : Institut Fresnel
Durée : 3 ans
Contact : remi.ANDRE@univ-amu.fr
Date limite de publication : 2021-03-31

Contexte :
La maladie d’Alzheimer est la maladie neurodégénérative la plus fréquente chez les personnes âgées. On estime qu’au moins 30 millions de personnes sont touchées par cette pathologie. Bien qu’il n’existe aucun traitement efficace à ce jour, on peut espérer retarder le début de la maladie et/ou atténuer les risques de la contracter en détectant suffisamment tôt des déficiences cognitives légères. Plusieurs modalités d’imagerie médicale telles que l’Imagerie par Résonnance Magnétique (IRM), l’IRM fonctionnelle ou encore la Tomographie par Emission de Positron (TEP) permettent d’identifier de manière précoce des changements se produisant dans le cerveau. L’examen TEP au FluoroDésoxyGlucose (TEP-FDG) est un outil puissant pour la détection précoce de la maladie d’Alzheimer. En effet, ce dernier est capable de mesurer la consommation de glucose dans le cerveau et permet ainsi d’observer des anomalies métaboliques avant que la structure anatomique du cerveau soit modifiée.
Les techniques d’aide au diagnostic clinique basées sur des approches d’apprentissage automatique sont aujourd’hui en plein essor. Un grand nombre de méthodes ont été développées particulièrement pour la détection de la maladie d’Alzheimer [3]. Ces méthodes se décomposent généralement en deux étapes : l’extraction d’attributs et la classification. L’extraction d’attributs étant utilisée en amont de la classification dans le but d’éliminer l’information redondante.

Sujet :
Le but étant de poursuivre l’effort de recherche de notre équipe sur le développement de méthodes d’Intelligence Artificielle pour la détection précoce de la maladie d’Alzheimer par analyse d’images TEP-FDG. Il s’agira alors principalement de développer et de mettre en oeuvre des algorithmes de diagonalisation conjointe de matrices et/ou de décompositions tensorielles le contexte médical précédemment décrit. (voir document en PJ pour plus de détails)

Profil du candidat :
Le candidat devra :
– maitriser les langages de programmation tels que MATLAB ou Python.
– maitriser l’anglais
– avoir un goût pour les sciences des données ainsi que pour les outils mathématiques et algorithmiques associés (algèbre, statistiques, optimisation…)

Une expérience dans le traitement d’images Biomédicales serait un plus.

IMPORTANT: Le candidat devra avoir une moyenne générale de 13/20 minimum lors de sa dernière année d’études.

Formation et compétences requises :
Etre en possession d’un diplôme Bac +5 ou en dernière année de cursus Bac+5 en:
-Traitement du signal/image
-Sciences des données
-Mathématiques
-Informatiques

Adresse d’emploi :
Institut Fresnel, Domaine Universitaire de Saint Jérôme, 13397 Marseille

Document attaché : 202102260910_sujet_ed_i_fresnel_andre_wojak_adel_2021.pdf

Categories: theses

L’intelligence artificielle au service des profils des apprenants : ciblage, optimisation et adaptat

Mar 31 – Apr 1 all-day

Offre en lien avec l’Action/le Réseau : DOING/Doctorants

Laboratoire/Entreprise : CNRS UMR SPE / Aflokkat
Durée : 3 ans
Contact : bisgambiglia@univ-corse.fr
Date limite de publication : 2021-03-31

Contexte :
L’organisme de formation Aflokkat souhaite développer en collaboration avec l’université de Corse son propre Learning Management System LMS.
Il devra proposer toutes les fonctionnalités connues des LMS mais également être intelligent et prendre en compte les spécificités de chaque apprenant (ILS : Systèmes d’apprentissage intelligents).
Ces travaux de recherche auront pour objectif d’affiner les propositions de contenu faites par le LMS et d’assurer un suivi individualisé des apprenants. L’intelligence artificielle du logiciel devra permettre de proposer le contenu adapté au moment opportun en fonction du profil de l’apprenant. Un tel outil présenterait des avantages considérables pour les apprenants, les formateurs et l’organisme de formation lui-même en lui permettant une amélioration continue de la qualité des formations dispensées. Ces travaux peuvent jouer un rôle fondamental dans l’amélioration des services proposés par Aflokkat et présente donc un fort intérêt stratégique pour la société.

Sujet :
L’apprentissage en ligne, ou e-learning, prend une place croissante dans l’éducation, aussi bien en complément de cours présentiels qu’en cours principal. C’est d’autant plus le cas aujourd’hui avec la crise que nous traversons. Dans la plupart des cas, cet apprentissage est assuré via des plateformes dédiées : learning management systems (LMS) ou environnement numérique de travail (ENT), offrant de multiples fonctionnalités aux enseignants pour la création et la gestion de leurs cours en ligne.

D’après la définition donnée par Baumgartner, Häfele et Maier-Häfele, un LMS doit répondre à cinq types d’opérations : présenter des contenus, fournir des outils de communication (forums de discussion, chat, vidéoconférences), créer des devoirs et des quizz, évaluer les performances des apprenants et fournir une aide administrative au niveau des cours et des étudiants. Un LMS ainsi conçu et développé, tel que Moodle, se concentre alors principalement sur la manière d’enseigner et moins sur la prise en compte des besoins individuels des apprenants.

Notre objectif est de développer et intégrer un système d’apprentissage adaptatif dans un LMS pour optimiser les apprentissages en fonction du profil de l’apprenant : analyse des connaissances, des préférences, des aptitudes, etc. afin de proposer plusieurs niveaux d’adaptabilité avec parcours pédagogique individualisé, séquencement des contenus selon les résultats obtenus et évaluations formatives.
Les systèmes d’apprentissage adaptatif visent à adapter les cours aux différences de chaque apprenant, notamment à leur style d’apprentissage et à favoriser l’ancrage mémoriel.

Profil du candidat :
Recherche un profil autonome et très curieux avec de fortes compétences en informatique et l’envie de découvrir les domaines de l’apprentissage et la formation professionnelle.

Formation et compétences requises :
Bac + 5 en informatique avec une forte appétence pour le machine learning, Big Data. …

Adresse d’emploi :
Ajaccio (Corse)

Document attaché : 202103101745_Fiche_Offre_Thèse_2021-22_CIFRE.pdf

Categories: theses

Coupling spectroscopy and high contrast imaging for exoplanet detection and characterisation

Apr 1 – Apr 2 all-day

Offre en lien avec l’Action/le Réseau : BigData4Astro/– — –

Laboratoire/Entreprise : Laboratoire d’Etudes Spatiales et d’Instrumentatio
Durée : 36 months
Contact : mickael.bonnefoy@univ-grenoble-alpes.fr
Date limite de publication : 2021-04-01

Contexte :
Since the pioneering discoveries of disks and extrasolar planets in the mid-90’s, a new domain of astrophysics, exo-planetology, has emerged, that aims at exploring the diversity of extrasolar planetary systems, at understanding their formation and evolution, and, at ultimately detecting Earth-like planets. Today, high-contrast adaptive optics-fed imagers searching massive Jupiters beyond 5-10 astronomical units, where true solar system giants reside. The combination with medium- (R=λ/Δλ=1000-10 000) and high- (R=λ/Δλ>10 000) resolution spectrographs is furthermore seen as a key approach to get rid of the speckle noise which hinders the direct imaging and spectroscopic characterization of planets with ground based tele-scopes. It also promises to yield new accurate information on the physical and at-mospheric properties of the planets in particular on the elemental abundances (C/O, N/H, Fe/Si) which appear as promising tracers of the object formation pathway. At such resolutions, molecular absorptions start to be resolved in the planet spectra and can be distinguished from the dominant signal from the host star. Correlation tech-niques are presently used to produce a coherent signal out of the planet spectral signatures and offer to simultaneously enhance the detection (e.g. Birkby et al, 2017) and characterize the objects. The molecular mapping technique (Hoeijmakers et al. 2018; Petrus et al. 2020) apply the cross-correlation techniques to hyperspec-tral data produced by integral-field spectrographs.
A suite of advanced integral field spectrographs equipped with coronographs and fed by performant adaptive optics modules will enter operation in the coming decade (VLT/ERIS, GTC/FRIDA, VLT/MAVIS, ELT/HARMONI & METIS) and overtake the capabilities of present instruments (e.g., VLT/SINFONI, Keck/OSIRIS). This motivates further the exploration of the diversity contained in medium- and high- resolution spectroscopic data to improve the performance capabilities and to extract properly the spectroscopic informations of the planet and characterize their physical and atmospheric properties.

Sujet :
The student will develop innovative methods to exploit the data diversity contained in medium- and high-resolution spectrographs (classical and integral field) to boost the detection capabilities of the instruments and retrieve quantitative information on the planet properties. The student will characterize the atmospheres of giant planets already
known and those detected in the course of our survey as part of the COBREX ERC project. For each planet, the physical characteristics (effective temperature, surface gravity, pressure-temperature profiles), and the composition will be derived. The possibility of intervening dust (circumstellar or circumplanetary) will be considered. Comparative studies between the various planets will be made.
The PhD will lead the development and validation of new algorithms applicable to integral field spectrograph at medium-resolving power and high-spectral resolution to improve the detection and characterization of giant planets (e.g., Petrus et al. 2020).
She/He will develop and test new approaches for inverting exoplanet spectra at medium and high spectral resolutions and measure the chemical abundances of the objects and constrain their atmospheric properties (cloud properties, bulk composition, temperature-pressure profiles-. This work will rely on the use of grids of synthetic spectra computed from atmospheric model (Charnay et al. 2018). She/He will couple these inversion methods to the detection algorithms in a single tool capable of both detecting and characterizing any object simultaneously.
The student will have access to data from cutting-edge integral field (Keck/OSIRIS, VLT/SINFONI, Gemini/NIFS) and high-resolution spectrographs (CFHT/SPIRou, ESO/NIRPS, CRIRES+). He/She will exploit a new high-contrast mode of the ERIS integral field spectrograph (ERIS+) to be installed on the VLT and evaluate the benefit and limitation of that mode based on observed and simulated data.

The first year of the PhD will be held at IPAG (Grenoble) where the data will be analyzed. The last two years will be held at LESIA, Observatoire de Paris.
The PhD will participate to observations to complete the available, archival data, in particular with ESO’s telescopes and with JWST.

Profil du candidat :
As a member of the ERC project COBREX, the student will be part of a lively team with expertises in exoplanetary systems (planets, disks), high contrast imaging, radial velocity, and astrometry. The student will also interact with experts of machine learning from the INRIA in Paris and GIPSA-Lab in Grenoble. Several PhD positions are opened as part of the project and can be found here:
*https://bit.ly/3i8p71l
*https://bit.ly/3oF8Hjk
*https://bit.ly/3icTnYK
*https://bit.ly/3sBoeTO

The student should be highly motivated and able to propose and deal with complex mathematical concepts. We are therefore looking for a student with a strong background in data science and applied mathematics and could provide him or her with the necessary informations on the astrophysical context related to her/his research topic.

The applicant should have a good level of written and spoken English.

Team work ability is essential.

Formation et compétences requises :
The applicant should have a Master’s degree in Computer Science, Physics, or Astrophysics and advanced skills in data analysis/signal processing.

Knowledge of Python and associated key librairies (scipy, numpy, pandas, scikit-learn, keras, tensorflow) will be highly appreciated. Knowledge of the Julia langage is a plus.

Adresse d’emploi :
LESIA
Observatoire de Paris, Section de Meudon
5, place Jules Janssen
92195 MEUDON Cedex

IPAG
414 Rue de la Piscine
38400 SAINT-MARTIN D’HERES

Categories: theses

Fri

Offre de Thèse CIFRE ECR Environnement & L@BISEN / ISEN Yncrea Ouest-Brest

Apr 2 – Apr 3 all-day

Offre en lien avec l’Action/le Réseau : – — –/Doctorants

Laboratoire/Entreprise : L@BISEN / ISEN Yncrea Ouest-Brest
Durée : 3 ans
Contact : youssefmour@gmail.com
Date limite de publication : 2021-04-02

Contexte :
Lors de campagnes de scan (nuage de points 3D) de bâtiments en vue de la réalisation de plans informatisés deux verrous technologiques viennent limiter l’automatisation de la tâche. 1. La taille des données acquises. 2. L’absence de lien direct entre les points 3D et les structures du bâtiment (poutres, ouvertures, tuyaux…).

Ainsi les données du scan ne sont pas exploitables directement pour en faire des plans ce qui oblige un opérateur à recréer toutes les structures à la main sur Autocad. En particulier, celui-ci doit cliquer sur des points qu’il suppose être une droite, une
courbe… et créer ainsi par superposition un plan vectoriel Autocad.
Ces opérations sont très longues à effectuer. Afin de rendre plus efficaces ces actions, il est nécessaire de développer des aides à la réingénierie. Après avoir éventuellement sélectionné une zone du nuage de points (ex. bâtiment 3D), nous souhaiterions pouvoir analyser et traiter les données automatiquement (Nettoyage, Projection en 2D, Segmentation, Détection de contour, Filtrage, … ) afin de proposer des figures géométriques parmi un ensemble de formes approchantes.

Pour cela, des algorithmes classiques et d’apprentissage automatique devront être testés et adaptés pour analyser et reconnaître les structures émergentes du nuage de points.

Sujet :
Vos activités de Recherche donneront lieu à :

● Une intégration au sein des équipes de recherche de l’ISEN Yncréa Ouest.
● La proposition de méthodes innovantes pour le traitement des données 2D et
3D.
● La participation aux publications scientifiques de niveau international.
● La création et la mise en place de prototypes.

Approches méthodologiques et techniques envisagées :

● Etat de l’art sur le traitement/segmentation des nuages de points 3D

● Etude avancée et critique des méthodes de littérature pour le
traitement/segmentation des nuages de points 3D en utilisant l’apprentissage profond

● L’implémentation de quelques-unes de ces méthodes en python sera un plus pour une bonne compréhension des approches existantes.

● Proposition de nouvelles approches permettant d’améliorer l’existant

● Développement d’un système opérationnel de la construction d’une maquette numérique à partir d’un nuage de points 3D

● …

Références succinctes :

[1] Yulan Guo, Hanyun Wang, Qingyong Hu∗, Hao Liu∗, Li Liu, and Mohammed Bennamoun. (2019), “Deep Learning for 3D Point Clouds: A Survey”.arXiv preprint arXiv:1912.12033

[2] Qi, C. R., Su, H., Mo, K., & Guibas, L. J. (2017), ”Pointnet: Deep learning on point sets for 3d classification and segmentation”. In Proceedings of the IEEE CVPR (pp. 652-660).

[3] Macher, H. (2017). Du nuage de points à la maquette numérique de bâtiment: reconstruction 3D semi-automatique de bâtiments existants (Doctoral dissertation, Université de Strasbourg).

[4] Macher, Hélène, Tania Landes, and Pierre Grussenmeyer. “POINT CLOUDS SEGMENTATION AS BASE FOR AS-BUILT BIM CREATION.” ISPRS Annals of Photogrammetry, Remote Sensing & Spatial Information Sciences 2 (2015).

[5] Macher, Hélène, Tania Landes, and Pierre Grussenmeyer. “From point clouds to building information models: 3D semi-automatic reconstruction of indoors of existing buildings.” Applied Sciences 7.10 (2017): 1030.

[6] Macher, Hélène, et al. “Semi-automatic segmentation and modelling from point clouds towards historical building information modelling.” Euro-Mediterranean Conference. Springer, Cham, 2014.

Profil du candidat :
Pour cette thèse, nous recherchons un (e) jeune doctorant (e) fortement motivé (e) et
ayant un goût prononcé pour l’innovation. Il/elle devra participer au développement
de prototypes qui s’intégreront dans les solutions utilisées par ECR Environnement et L@bISEN Yncréa Ouest.

Concernant les aspects liés à la recherche, le/la candidat (e) devra être titulaire d’un Master avec des compétences en vision par ordinateur. Il/elle est aussi nécessaire qu’il/elle ait une expérience en apprentissage automatique et/ou en fouille de
données. Une expérience en analyse de données 3D serait un plus.

Formation et compétences requises :
Pour le côté développement, le profil recherché devra avoir la maîtrise de la programmation, objet, structurée et algorithmique. Une maîtrise des langages Python, C sont des prérequis et une connaissance du langage C++ et de la
bibliothèque OpenCV seraient un plus.

Un bon niveau en mathématique semble naturellement nécessaire avec notamment une expérience réussie dans les techniques d’apprentissage automatique (de type réseau de neurones dans l’idéal).

Une ouverture d’esprit suffisante pour s’intégrer dans une nouvelle équipe est demandée pour s’imprégner rapidement de métiers et d’univers inconnus dans le but de comprendre les enjeux.

Adresse d’emploi :
Lorient/Brest, France

Document attaché : 202102221551_Profil Thèse-CIFRE-ECR-ISEN.pdf

Categories: theses

Mon

Predictive Query Optimization for Multi-tenant Cloud DBMSs

Apr 5 – Apr 6 all-day

Offre en lien avec l’Action/le Réseau : – — –/Doctorants

Laboratoire/Entreprise : IRIT Institut de Recherche en Informatique de Toul
Durée : 3 ans
Contact : hameurlain@irit.fr
Date limite de publication : 2021-04-05

References

Profil du candidat :

Master 2 in Computer Science:
– Data & Knowledge Management Systems
– Distributed and Parallel Systems.

Formation et compétences requises :

Master 2 in Computer Science with the following requirements:

Distributed and Parallel Systems, Data Management Systems, Database Systems, Query Processing and Optimization, Cost Models, Cloud Systems, Programming Languages (e.g. C++, Java, Python).

The Application should include following documents (PDF format, see: http://www.edmitt.ups-tlse.fr/):
1- CV mentioning all your degrees
2- Motivation letter from the applicant explaining his/her choice of the proposed thesis subject
3- Recommendation letters
4- Details of your grades since you started higher education with ranking.

Applications in digital form (pdf) should be sent to: hameurlain@irit.fr
Application Deadline: March 31st, 2021
Start Date: October 1st, 2021.

Adresse d’emploi :
Paul Sabatier University, Toulouse 3
IRIT Institut de Recherche en Informatique de Toulouse
Team: PYRAMID (Dynamic Query Optimization in Large-scale Distributed Environments); https://www.irit.fr/PYRAMIDE/)
118, Route de Narbonne
F-31062 TOULOUSE Cedex
FRANCE

Document attaché : 202103171000_PhD Subject 2021_Predictive_Query_Optimization.pdf

Categories: theses

Classification d’images photographiques pour le suivi spatio-temporel des chantiers de restauration