Big Data Series Analytics in the Context of Environmental Crowd Sensing

When:

23/10/2018 – 24/10/2018 all-day

2018-10-23T02:00:00+02:00

2018-10-24T02:00:00+02:00

Annonce en lien avec l’Action/le Réseau : Doctorants

Laboratoire/Entreprise : DAVID Lab – University of Versailles
Durée : 3 ans
Contact : Karine.Zeitouni@uvsq.fr
Date limite de publication : 2018-10-23

Contexte :
Upon the recent development of advanced computing and communication technologies, the world is witnessing the rise of the so-called Internet of Things (IoT). IoT envisions a world where everything is connected – from humans and computing devices to animals, vehicles, and even the smallest appliances. Sensors and actuators are fetched on things enabling them to sense, generate data, communicate, act, and share information. This is leading to the generation of massive amount of data, now regarded as Big Data or Big Sensing Data in the IoT context. With great embedded potential in this data, both industry and academia are rushing to develop methods and technologies that not only can handle this large amount of data but can also exploit them in order to mine new knowledge and insights.
One application of IoT is monitoring of air pollution. Several research initiatives have used fixed air pollution sensors to monitor air quality [1]. However fixed sensors have been facing shortcomings in modeling air quality because of the high spatiotemporal variability nature of air pollutants. That is why the community is shifting toward a new monitoring paradigm, namely mobile crowd sensing, that empowers volunteers to contribute data acquired by their personal sensor-enhanced mobile devices [2]. This is enabled by the use of emerging low-cost and lightweight air pollution sensor boxes, which can be fetched on pedestrians, cyclists, or on vehicles. Opportunistic air quality monitoring takes advantage of existing mobile infrastructure or people common daily routines to perform monitoring [3].
This paradigm has several advantages compared to conventional monitoring techniques. First, it promotes personalization where each individual will be able to gain insights on his/her exposure. Second, it measures indoor and outdoor environments (Home, Work, Transportation, Streets, Parks, etc.) and expands the spatial coverage, depending on the participants whereabouts. Finally, it enables insights at a higher resolution along the participants trajectories, thereby allowing to capture local variability and peaks of pollution. Nevertheless, the main limitation of opportunistic sensing arises from its uncontrolled sampling nature, leading to highly uneven data density across regions and times of the day. Mining such inhomogeneous samples inherently raises unique challenges that we intend to tackle in this thesis. From the perspective of the study of daily exposure, typical exposure profiles could be mined from the longitudinal data set. However, there is a gap to fill between the raw sensor data series and high-level profiles.

Sujet :
While Mobile Crowd Sensing paradigm has opened the door for new possibilities, it has also generated some challenges [2]. Indeed, the nomadic nature of sensors, and their combination (air pollution is often monitored using multi-sensor devices) lead to revisit the traditional methods of data mining and knowledge extraction. These sensors typically produce multivariate time series where one variable is the geographical position of the device (we call it complex data series). Nevertheless, exploiting such complex data series for analytical purpose, such as exploratory analysis using data mining techniques, is far from straightforward. Since raw sensor data are mostly noisy and acquired at irregular (and asynchronous) frequencies, direct use of the state-of-the-art methods, such as time series analysis and mining, is insufficient. Besides, to take full advantage of these data, it should not be only analyzed in isolation, but rather by matching them with the context, and analyzing them under multiple dimensionality and scale (e.g., spatial, user, micro-environment, time dimensions). Here comes one of the challenges on how to transit from raw and heterogeneous complex data series into such a type of high-level models.
Moreover, going further in exploiting the personalization aspect of opportunistic mobile sensing enables individuals to relate air pollution to themselves [4], and to act upon gained insights. For example, an individual may change his/her daily routes, transportation means, even his/her activities in sake of lesser exposure and lesser health effects. Nonetheless, this requires building individual profiles, and correlating them with personal health data and activities. This correlation opens the way for highlighting potential relations of causality, or inferring the exposure based on an activity profile or a planned route.
In this thesis, we aim at developing data mining methods adapted to opportunistic samples of geodated series along with associated contextual data on the one hand, and studying multi-dimensional exploratory analysis and aggregation of such data on the other hand.

Profil du candidat :

– Good background in data mining and machine learning
– Strong programming, system, and database skills
– Good oral communication and technical reading and writing skills in English
– Proficiency in French is desirable.

Formation et compétences requises :
The applicant should hold a Master diploma in Computer science, or equivalent.

Adresse d’emploi :
Hosting laboratory:
DAVID Lab/ADAM Team, University of Versailles St-Quentin / Paris-Saclay University: www.david.uvsq.fr
Located in the city of Versailles
Doctoral school: https://www.universite-paris-saclay.fr/en/doctorate

Document attaché : PhD_Proposal_Polluscope_Versailles_France.pdf

MaDICS

Masses de Données, Informations et Connaissances en Sciences

Big Data - Data Science

Big Data Series Analytics in the Context of Environmental Crowd Sensing