Non-Stationary and robust Reinforcement Learning methodologies for surveillance applications

04/09/2022 – 05/09/2022 all-day

Offre en lien avec l’Action/le Réseau : – — –/Doctorants

Laboratoire/Entreprise : Laboratoire des signaux et systèmes (L2S), Univers
Durée : 3 ans
Contact :
Date limite de publication : 2022-09-04

Contexte :
Reinforcement Learning (RL) methodologies are currently adopted in different context requiring sequential decision-making tasks under uncertainty [1]. The RL paradigm is based on the perception-action cycle, characterized by the presence of an agent that senses and explores the unknown environment, tracks the evolution of the system state and intelligently adapts its behaviour in order to fulfil a specific mission. This is accomplished through a sequence of actions aiming at optimizing a pre-assigned performance metric (reward). There are countless applications that can benefit from this perception-action cycle (traffic signal control, robots interactions the physical objects, just to cite a few), each of which is characterized by a peculiar definition of “uncertainty” or “unknown environment”. A more precise definition of this uncertainty strongly depends on the particular domain considered. However, there is at least one crucial assumption underlying the majority of classical RL algorithms: the stationarity of the environment, i.e. the statistical and physical characterization of the scenario, is assumed to be time-invariant. This is clearly a quite restrictive limitation in many real-world RL applications, where the agent is usually embedded in a changing scenario whose both statistical and physical characterization may evolve over time. Due to the crucial importance of including the non-stationarity in the RL framework, both theoretical and application-oriented non-stationary approaches have been proposed recently in the RL literature (e.g. [2], [3]). Among the numerous potential applications, in this project we will focus on the problem of Cognitive Radar (CR) detection in unknown and non-stationary environment. Specifically, building upon the previous works [4], [5], we will aim at proposing an RL based algorithm for cognitive multi-target detection in the presence of unknown, non-stationary disturbance statistics. The radar acts as an agent that continuously senses the unknown environment (i.e., targets and disturbance) and consequently optimizes transmitted waveforms in order to maximize the probability of detection (PD) by focusing the energy in specific range-angle cells.

Sujet :
The scientific goal of the proposed PhD thesis is twofold. Firstly, the PhD candidate will get familiar and develop original RL-based algorithms for non-stationary environments. These theoretical outcomes will be then applied to a specific scenario of great interest nowadays: the radar detection of drones. More specifically, the PhD thesis will be structured in two steps:
1. Theoretical foundation of non-stationary RL algorithms: The aim of this first step is to develop an original theoretical foundation of non-stationary Markov Decision Processes (MDP) [2]. In particular, the candidate will investigate the possibility to generalize classical RL methodologies to MDP characterized by a time-varying sets of states, actions and reward functions. This non-stationary generalization is of crucial importance for a wide variety of applications and it is an almost unexplored research field.
2. Surveillance applications and drone detection: The theoretical results obtained in the first part of the PhD thesis will then be used to derive and implement new algorithms for drones detection and tracking using radar systems [4], [5]. Specifically, we will consider a co-located Multiple-Input-Multiple-Output (MIMO) radar with a large (“massive”) number of transmitters and receivers. It has been shown, in fact, that this massive MIMO configuration allows one to dispense with unrealistic assumptions about the a-priori knowledge of the statistical model of the disturbance [4].

[1] R. S. Sutton, A. G. Barto (2018). Reinforcement Learning: An Introduction. MIT press, Cambridge,
[2] E. Lecarpentier, E. Rachelson, “Non-Stationary Markov Decision Processes, a Worst-Case Approach using Model-Based Reinforcement Learning,” Advances in Neural Information Processing Systems 32 (NeurIPS 2019), 2019, pp. 7214–7223.
[3] S. Padakandla, K. J. Prabuchandran, S. Bhatnagar, “Reinforcement learning algorithm for non-stationary environments,” Applied Intelligence 50, 3590–3606 (2020).
[4] S. Fortunati, L. Sanguinetti, F. Gini, M. S. Greco, and B. Himed, “Massive MIMO radar for target detection,” IEEE Transactions on Signal Processing, vol. 68, pp. 859–871, 2020.
[5] A. M. Ahmed, A. A. Ahmad, S. Fortunati, A. Sezgin, M. S. Greco, and F. Gini, “A reinforcement
learning based approach for multitarget detection in massive MIMO radar,” IEEE Transactions on Aerospace and Electronic Systems, vol. 57, no. 5, pp. 2622–2636, 2021.

Profil du candidat :
This interdisciplinary project requires skills in statistical signal processing and machine learning, with specifical focus on Reinforcement Learning. Basic knowledge of radar principles may be useful but not required. Concerning the programming languages, the candidate should have a good knowledge of Matlab and possibly of Python.

Formation et compétences requises :
1) Statistics,
2) Reinforcement Learning,
3) Statistical Signal processing.

Adresse d’emploi :
Laboratoire des signaux et systèmes (L2S),
bât. Bréguet, 3, rue Joliot Curie,
91190 Gif-sur-Yvette.

Document attaché : 202206030915_PhD_Proposal_Fortunati.pdf