Présentation Générale

MaDICS est un Groupement de Recherche (GDR) du CNRS créé en 2015. Il propose un écosystème pour promouvoir et animer des activités de recherche interdisciplinaires en Sciences des Données. Il est un forum d’échanges et d’accompagnement pour les acteurs scientifiques et non-scientifiques (industriels, médiatiques, culturels,…) confrontés aux problèmes du Big Data et des Sciences des données.
Pour en savoir plus…


Les activités de MaDICS sont structurées à travers des Actions et Ateliers. Les Actions rassemblent les acteurs d’une thématique précise pendant une durée limitée (entre deux et quatre ans). La création d’une Action est précédée par un ou plusieurs Ateliers qui permettent de consolider les thématiques et les objectifs de l’action à venir.


Le site de MaDICS propose plusieurs outils de support et de communication ouverts à la communauté concernée par les Sciences des Données:

  • Manifestations MaDICS : Le GDR MaDICS labellise des Manifestations comme des conférences, workshops ou écoles d’été. Toute demande de labellisation est évaluée par le Comité de Direction du GDR. Une labellisation rend possible un soutien financier pour les jeunes chercheuses et chercheurs. Une labellisation peut aussi être accompagnée d’une demande de soutien financier pour des missions d’intervenants ou de participants à la manifestation.
    Pour en savoir plus…
  • Réseaux MaDICS : pour mieux cibler les activités d’animation de la recherche liées à la formation et à l’innovation, le GDR MaDICS a mis en place un Réseau Formation destiné à divers publics (jeunes chercheurs, formation continue,…), un Réseau Innovation pour faciliter et intensifier la diffusion des recherches en Big Data, Sciences des Données aux acteurs industriels et un Club de Partenaires qui soutiennent et participent aux activités du GDR.
    Pour en savoir plus…
  • Espace des Doctorants : Les doctorants et les jeunes chercheurs représentent un moteur essentiel de la recherche et le GDR propose des aides à la mobilité et pour la participation à des manifestations MaDICS.
    Pour en savoir plus…
  • Outils de communication : Le site MaDICS permet de diffuser des informations diverses (évènements, offres d’emplois, proposition de thèses, …) liées aux thématiques de recherche du GDR. Ces informations sont envoyées à tous les abonnés de la liste de diffusion MaDICS et publiés dans un Calendrier public (évènements) et une page d’offres d’emplois.

Adhésion au GDR MaDICS : L’adhésion au GDR MaDICS est gratuite pour les membres des laboratoires ou des établissements de recherche publics. Les autres personnes peuvent adhérer au nom de l’entreprise ou à titre individuel en payant une cotisation annuelle.
Pour en savoir plus…


Manifestations à venir

Journées Ecoles Conférences et Séminaires

Actions, Ateliers et Groupes de Travail :

DAE DatAstro DSChem EXMIA GRASP RECAST SaD-2HN SIMDAC SimpleText TIDS  


Apr
1
Thu
2010
Evolution de réseaux sociaux personnels en ligne : nouvelles techniques et comparaisons
Apr 1 – Apr 2 all-day

Annonce en lien avec l’Action/le Réseau : aucun

Laboratoire/Entreprise : Laboratoire ETIS
Durée : 5
Contact : claudia.marinica@ensea.fr
Date limite de publication : 2010-04-01

Contexte :
Internship 6 months at the ETIS Lab UMR 8051 (Paris area)
(University Paris Seine, University of Cergy Pontoise, ENSEA, CNRS)

Title
Evolution of personal online social networks: new techniques and comparisons

Supervisors
Claudia Marinica, ETIS lab, MIDI team
https://perso-etis.ensea.fr//marinica/

Sujet :
Description

Nowadays, Online Social Networks (OSNs) allow to the users to be in direct contact, to exchange information, messages, etc.; these networks evolve with the evolution of the life of the users. Moreover, a social network can be seen as a group of several smaller networks centered around one individual. These small networks are called Online Personal Networks (ONPs) [1], because they are composed of a central individual (the ego) and several additional individuals (called the alters) that are connected to the ego directly or indirectly.
Moreover, the OPNs (as classical OSNs) evolve over time, but, given that their study is very recent, it is still not clear if their evolution is comparable to the one of OSNs. In the last studies that we conducted in [2], we analyzed the evolution of collaboration networks (such as DBLP – https://dblp.uni-trier.de/xml/) by analyzing the evolution of the values of several metrics. Currently, we are proposing a specific evolution model.
This internship deals with two issues: (1) the study of the OPNs by using other techniques in order to compare and validate the results obtained in our previous studies, and (2) make available to the community the tools developed so far.
The first issue concerns the use of data mining techniques in order to asses the evolution of OPNs. Indeed, a specific representation of a social network at a time t could allow to extract a set of information like “if a network has 2 new nodes at time t, then it also has the tendency to has 2 new nodes in time t+1”. In this context, several challenges can be outlines, such as the choice of the data mining technique and the modelling of the data in order to apply the chosen technique. These challenges are clearly related, and several techniques can be used, depending on the expected result.
The second issue concerns the development of a tool for the analyses of the evolution of OPNs selected from an OSN. This tool would be used by the experts studying a specific OSN and who would like to understand how the OPNs from the OSN evolve over the time. In this part, existing developments (produced during an ongoing PhD) can/should be used and should be completed by the propositions made during the internship.

Objectives

In the last years, a set of studies on the analysis of social networks was dedicated to the understanding of their evolution over time. These works tried to develop generative models for big networks reproducing the properties of real networks, like the scale free networks (where the degrees follow a power law), the high clustering coefficient, and low shortest path (known as small-world phenomenon [3]).
Nevertheless, the studies related to the evolution of personal networks and quite limited. For example, some of the works tried to understand if the personal online networks are comparable to the offline ones studied in sociology [5]; these works focused on the evolution of each level of alters in a personal network by using the conclusions from the studies on cognitive charge of an individual [4].
Even if these conclusions are important and that they can be used in order to modify the evolution of these types of relations, they do not allow to understand the evolution of the structure of a personal network. In [2], we propose a methodology to study the evolution of personal networks based on the evolution of a set of metrics.
This internship had two objectives:
1/ The first one is to propose a new methodology for the analysis of the evolution of OPNs which will allow us to compare the results already obtained in our previous studies. The new methodology will integrate data mining techniques. We propose here to use the frequent pattern mining technique because it allows us to detect frequent pattern of the evolution of OPNs. In this context, a state of the art should be done, but also a study in order to chose a specific pattern among the different existing patterns: transactional, sequential, graph, etc.
2/ The second one concerns the development of an online tool accessible by the community and the experts. Mainly the tool should:
• Display the OSN;
• Select the ego and other parameters allowing to extract one or several OPNs;
• Display the OPNs;
• Select the metric to compute on the chosen OPNs;
• Display the value of the metrics of OPNs;
• Display the result of the data mining technique;
• Compare the two previous results.

References

[1] Sarah Djemili, Claudia Marinica, Maria Malek, Dimitris Kotzinos (2016). A Definitions’ Framework for Personal/Egocentric Online Social Networks. 7ème conférence sur les modèles et l’analyse des réseaux: Approches mathématiques et informatiques (MARAMI’16).
[2] Sarah Djemili, Claudia Marinica, Maria Malek, Dimitris Kotzinos. Personal Networks of Scientific Collaborators: A Large Scale Experimental Analysis of Their Evolution. Information Search, Integration, and Personlization. Communications in Computer and Information Science., 760, Springer, Cham, 2017.
[3] Travers, Jeffrey and Milgram, Stanley. The small world problem. Phychology Today. pages: 61-67. 1967.
[4] Arnaboldi, Valerio and Conti, Marco and Passarella, Andrea and Dunbar, Robin. Dynamics of personal social relationships in online social networks: a study on twitter. Proceedings of the first ACM conference on Online social networks. pages: 15-26, 2013.
[5] Sutcliffe, Alistair and Dunbar, Robin and Binder, Jens and Arrow, Holly. Relationships and the social brain: integrating psychological and evolutionary perspectives. British journal of psychology. pages: 149-168. 2012.

Profil du candidat :
Candidate
We are looking for a 2nd year Master Student (M2) with knowledge in social networks and/or data mining techniques.

Application info
The position will be open until filled. Starting date before 01/04/2019. Apply by sending your CV, recommendation/motivation letters and grades for at least the 1st and 2nd year of the Master (M1 and M2) to:
Claudia.Marinica@u-cergy.fr

Formation et compétences requises :
2nd year Master Student (M2) with knowledge in social networks and/or data mining techniques

Adresse d’emploi :
2 avenue Adolphe Chauvin, 95300, Cergy-Pontoise

Document attaché : sujet_stage_eng.pdf

Feb
7
Sun
2016
Appel à sujets de stage L3 ENS de Lyon
Feb 7 – Feb 6 all-day

Annonce en lien avec l’Action/le Réseau : aucun

Laboratoire/Entreprise : laboratoire d’informatique
Durée : 6 semaines
Contact : colin.riba@ens-lyon.fr
Date limite de publication : 2016-02-07

Contexte :
À l’issue de leur première année de cours (L3), les élèves en
informatique de l’École Normale Supérieure de Lyon doivent
effectuer un stage de six semaines (quelque part entre
juin et fin août) dans une équipe de recherche universitaire ou
associée. Nous vous sollicitons pour proposer un sujet de stage et/ou
diffuser cette demande autour de vous.

Sujet :
Les informations concernant ces stages sont disponibles en ligne :
http://www.ens-lyon.fr/DI/stageL3/

La soumission de propositions de stages s’effectue également en ligne :
http://www.ens-lyon.fr/DI/stageL3/submit.php

– Ouverture : 15 décembre 2015
– Fermeture : 7 février 2016

Profil du candidat :
N’hésitez pas à faire suivre cette information à des collègues qui
peuvent être intéressés.

En vous remerciant de l’aide que vous pouvez apporter à l’organisation de ces stages,

Bien cordialement,

Colin Riba

Formation et compétences requises :
NA

Adresse d’emploi :
NA

Document attaché :

Mar
1
Tue
2016
Stage de Master 2 ou Ecole d\’ingénieur Bases de données distribuées sur architecture Big Data.
Mar 1 – Feb 29 all-day

Annonce en lien avec l’Action/le Réseau : aucun

Laboratoire/Entreprise : CEA LIST (DM2I/LADIS)
Durée : 6 mois
Contact : lorene.allano@cea.fr
Date limite de publication : 2016-03-01

Contexte :
Le CEA LIST (http://www-list.cea.fr/) est un centre d’innovation et de recherche technologique atour aux nouvelles technologies de l’information et de la communication. Le CEA LIST est basé sur le plateau de Saclay et appartient à l’université Paris Saclay. Au sein du CEA LIST notre laboratoire est dédié aux Data Analytics et au Big Data.

Sujet :
Bases de données distribuées sur architecture Big Data.

Pour ce stage de Master 2, nous recherchons un candidat ingénieur R&D autonome et motivé par les nouvelles technologies et les architectures Big Data pour développer un environnement de base de données distribuées dans le cadre d’un projet de recherche translationelle. L’objectif de ce stage sera dans un premier de faire un benchmark des systèmes de gestion de bases de données distribuées et NoSQL et ensuite de mettre en œuvre le système adapté au cahier des charges du projet sur notre architecture Big Data. Le développement sera de préférence réalisé en Python.

Profil du candidat :
étudiant en master 2 ou dernière année d’école d’ingénieur profil Big Data.

Formation et compétences requises :
Python, Spark, Hadoop, NoSQL

Adresse d’emploi :
centre de Saclay – Digiteo Labs
DM2I / LADIS – PC 192
91191 Gif sur Yvette Cedex
http://www-list.cea.fr/

Document attaché :

Mar
31
Thu
2016
Repairing SQL queries to retrieve missing answers
Mar 31 – Apr 1 all-day

Annonce en lien avec l’Action/le Réseau : aucun

Laboratoire/Entreprise : ETIS
Durée : 6 mois
Contact : Dimitrios.Kotzinos@u-cergy.fr
Date limite de publication : 2016-03-31

Contexte :
The internship will take place in the MIDI team of the ETIS Lab (ENSEA / UCP / CNRS UMR 8051) based in the area of Cergy Pontoise, just outside Paris.

The internship will have a net salary of around 508 euros/month and a duration of up to 6 months, starting on March or April 2017.

Interested candidates are requested to send a detailed CV, one recommendation letter and university/master transcripts to Katerina Tzompanaki at atzompan@u-cergy.fr.

Sujet :
Repairing SQL queries to retrieve missing answers.

The increasing load of data produced nowadays is coupled with an increasing need for complex data transformations that developers design in order to process or integrate these data. These transformations, commonly specified declaratively in the form of queries, may fail to produce all the expected results leading to what we call missing data. Understanding the reasons why missing data occur, and how the original query can be modified in order to overcome these reasons, can be tricky if manually performed. In the context of relational databases, [1] proposed a novel way (Why-Not polynomials) to explain missing data given a certain query. Consequently, [3] described a first approach of utilizing Why-Not polynomials in order to effectively repair the query, while [2] shows a prototype implementing these algorithms. As the query repairing phase of the framework heavily depends on the size of the database and the complexity of the Why-Not polynomial, a more efficient solution needs to be devised, either by improving the existing algorithm or by proposing a new one. This will be the focus of the master internship.

More specifically, the candidate is expected to

1) Verify/Identify the bottlenecks of the existing solution, algorithmically and experimentally.
2) Propose improvements of the algorithm.
3) Implement the new improved algorithm and experimentally prove its efficiency.

References

[1] Bidoit, Nicole, Melanie Herschel, and Aikaterini Tzompanaki. “Efficient computation of polynomial explanations of why-not questions.” Proceedings of the 24th ACM International on Conference on Information and Knowledge Management. ACM, 2015.
[2] Bidoit, Nicole, Melanie Herschel, and Katerina Tzompanaki. “EFQ: Why-not answer polynomials in action.” Proceedings of the VLDB Endowment 8.12, 2015.
[3] Bidoit, Nicole, Melanie Herschel, and Katerina Tzompanaki. “Refining SQL Queries based on Why-Not Polynomials.” 8th USENIX Workshop on the Theory and Practice of Provenance (TaPP 2016). 2016.

Profil du candidat :
The candidate shoukd be an M2 level student.

Formation et compétences requises :
The candidate will have solid knowledge of the Java programming language. Familiarity with SQL query language and relational databases is desired.

Adresse d’emploi :
MIDI team
ETIS Lab (ENSEA / UCP / CNRS UMR 8051)
Site St. Martin
2 Av. Adolph Chauvin
95000 Pontoise

Document attaché :

Apr
1
Fri
2016
Offre de stage – Master 2 : conception de profil utilisateur à partir de traces de navigation
Apr 1 – Apr 2 all-day

Annonce en lien avec l’Action/le Réseau : aucun

Laboratoire/Entreprise : LINA/POLYTECHNANTES
Durée : 6 mois
Contact : antoine.pigeau@univ-nantes.fr
Date limite de publication : 2016-04-01

Contexte :
The Open Class Room website provides online courses in various area, from art and culture to computer science. A course is composed of text, video or ebook that users browse/read/download after a registration process. The validation of a course is carried out thanks to exercises and quizzes.
A validation of a course may result in a certificate if the user chooses a premium registration.

The success of Open Class Room enables the availability of many user profiles with their associated traces on the courses:
• the personal information are provided on the registration process. Each user provides his name, his gender, his skills, his grade, his job and the courses taken;
• the traces contain all the actions carried out by the users on both the client and server sides.

Accesses to each course, parts of a course or a chapter, quiz and exercise are then recorded for each user. The goal of the project is to study the reasons for the failure or the success of the students.
Providing such an answer is of much interest for both the course designers and the managers of the Open Class Room website. The objective of such a study is to improve the design of the courses
and to be capable of anticipating the failure of a student

Sujet :
Le détail du sujet est disponible sur la page “positions>Master Topics” du site de l’équipe Duke:
http://duke.univ-nantes.fr/wp-content/uploads/2015/12/StageHubble-DUKe.pdf

The objective of the internship is the modelling of user profiles based on their personal information and on the way they browse/learn a course.

Our input data are the personal information of the user and his traces, and the output is a set of user’s profiles. Such a profile will summarize the background, the interests and the learning
methods of a user (or a group of users).

The following research area will be studied to generate the profiles:
• process modelling : a set of user traces on a same course is summarized to emphasized
the main step to learn a course, from the beginning to the validation exercises;
• pattern mining and sequential pattern mining : search for sequence similarities on a specific course, or a set of courses. For instance, a pattern obtained from users with high marks could be pertinent to motivate a good practice;
• user clustering: search for groups of users with similar backgrounds and a similar way to browse/learn the courses. The clustering could be obtained from the two previous points.
Group of users would be defined with a similar process or similar frequent patterns.

Each data mining process can be applied on a subset of users, defined with specific values of the personal user’s information, the skills and grades for example. The objective here will be to
study the correlations between set of users with different properties.

Profil du candidat :
Les seules contraintes concernent la formation et les compétences.

Formation et compétences requises :
Niveau de formation demandé : Master 2

Le candidat doit avoir des compétences dans les domaines suivants:
– Clustering
– pattern mining
– process modelling

Adresse d’emploi :
PolytechNantes

Document attaché : stagehubble-duke.pdf

Apr
30
Sat
2016
A data flux comparison among different Distributed Frequents Itemset Mining Algorithms over MapReduce platform
Apr 30 – May 1 all-day

Annonce en lien avec l’Action/le Réseau : aucun

Laboratoire/Entreprise : ETIS – ENSEA / Université de Cergy-Pontoise / CNRS
Durée : 6 mois
Contact : Tao-Yuan.Jen@u-cergy.fr
Date limite de publication : 2016-04-30

Contexte :
Object: Internship Master / Engineer

Place: Paris Area, Université de Cergy-Pontoise, Cergy-Pontoise, France

Subject: A data flux comparison among different Distributed Frequents Itemset Mining Algorithms over MapReduce platform

Period: 6 months internship from April/May to September/October 2015 – approx. 508€/month

For further information on the internship subject please contact:
Tao Yuan Jen

Sujet :
Description: This internship subject deals with two research fields: Data Mining and Cloud Computing.

The objective of the internship is
(1) to implement or find the source code for the following Distributed Frequents Itemset Mining Algorithms over MapReduce platform :
MRApriori algorithm, IMRApriori algorithm, SPC and DPC algorithms, DPFPM algorithm, Mreclat algorithm, and Apriori-V algorithm.
(2) to compare the mining performance, the quantity of data distributed in each data node before the mining work and the quantity of data communicated among each node in the mining work among these algorithms.
(3) to develop or find the source code for a vertical data layout bitmap converter, if it is necessary, for the preparation of data sets in different experiences.
(4) to study and implement, if it’s possible with the time constraint, some improvements for Apriori-V algorithm.

This internship will contribute in order to:
1. understand different waysof working of the main types of Distributed Frequents Itemset Mining Algorithms over MapReduce platform;
2. clarify the utilisations and the flux of different data types in Distributed Frequents Itemset Mining Algorithms over MapReduce platform;
3. plan our future development and improvements for our ongoing studies related to Apriori-V algorithm in this domain.

The internship is available immediately, will take place at the ETIS Lab (ENSEA / UCP / CNRS UMR 8051) located at Cergy Pontoise in the Paris area and will last for 6 months.

For further information on the internship subject please contact:
Tao Yuan Jen

Profil du candidat :
Engineer/Master2

Formation et compétences requises :
The candidate should be familiar to data mining techniques and the MapReduce platform.

Adresse d’emploi :
2 avenue Adolphe-Chauvin
BP 222, Pontoise
95302 Cergy-Pontoise cedex

Document attaché :