Temporal phenotyping of patients from EHR data based on tensor decomposition

01/02/2022 – 02/02/2022 all-day

Offre en lien avec l’Action/le Réseau : DOING/– — –

Laboratoire/Entreprise : Inria Lyon
Durée : 4 – 6 mois
Contact : thomas.guyet@inria.fr
Date limite de publication : 2022-02-01

Contexte :
**Supervising environment**

The project is proposed to contribute to the chair AI-RACLES funded by Inria-APHP-CS. Inria is the French national institute for digital science. APHP is the greater Paris university Hospital. And Central Supelec (CS) is a prestigious engineering school. AI-RACLES aims at developing artificial intelligence techniques to better exploit the APHP data lake to improve healthcare system and practices, especially for fragile patients.

The internship is proposed by two chair holders of AI-RACLES (Thomas Guyet and Pr. Etienne Audureau) and it will be supervised by:
* Thomas Guyet, Inria, Lyon thomas.guyet@inria.fr
* Pr. Etienne Audureau, APHP/UPEC, CEpiA (Clinical Epidemiology and Ageing), CHU Henri Mondor, etienne.audureau@aphp.fr
* Romain Tavenard, Univ. Rennes/LETG, romain.tavenard@univ-rennes2.fr

There will be opportunities for a funded PhD position after the internship.


The APHP data lake is a huge Electronic Health Records (EHR) repository of the patients being admitted in one of the hospitals located in the greatest Paris. The database contains information about patient visits, including the care and drugs delivered along each of their visit (with their timestamps). For example, the APHP identified a cohort of more than 20,000 patients hospitalized during the Covid-19 crisis. A dataset was thus created from information on their condition and the care they received. This information constitutes their care pathway.

The main objective of the chair AI-RACLES is to develop new artificial intelligence techniques to analyze this data lake in order to address health questions. The context of this internship is to investigate how to support the evaluation of health care pathways. The notion of health care pathways denotes the sequence of cares of a patient being cured for a given disease. Quality assessment aims to identify the key characteristics of pathways which may likely leads to a positive outcome for the patient. For example, in the case of the Covid-19 crisis, it is interesting to identify the care strategies that would prevent patients from requiring intensive cares.

The first step to achieve this objective is to describe the actual care pathways. The APHP data lake gives us the opportunity to describe objectively the care pathways of patients from historical data. This internship aims to contribute to identifying the care pathways through the unsupervised or semi-supervised machine learning techniques.

Sujet :
The proposed research direction is the use of a powerful unsupervised machine learning technique called tensor factorization (or tensor decomposition).

In the context of EHR data analysis, tensor is seen as a three-dimensional tensor whose dimensions are the patient identifier, the time and the medical events (procedures, labtests, drugs delivered. The decomposition of two dimensional tensors allow the identification of typical patient profiles (the medical events per patients), which are called phenotypes. A care pathway is then represented by the sequence of the phenotypes.

The problem of tensor decomposition is an old statistical problem for which statistical approaches have been proposed since the early years of the past century. But in recent years, this problem is renewed on the light of machine learning, and neural networks. Several recent neural networks architecture have been proposed. They proved the feasibility of the approach to decompose efficiently large and complex tensors. In parallel, the interest of phenotyping from EHR data has also been highlighted in the biomedical literature.

In this internship, we would like to investigate the notion of temporal phenotypes, and temporal phenotyping. Contrary to a phenotype that gives a combination of medical events at one time instant, a temporal phenotype describes a temporal arrangement of medical events. It is thus more expressive and may be useful to identify short-term procedures that make the care pathways.

A similar objective is targeted by Emonet et al. with Temporal Analysis of Motif Mixtures (TAMM). The problem of identifying temporal phenotypes (topic models) is addressed by a non-parametric Bayesian model fitted using Gibbs sampling. One of the limitation of the proposal is the slowness and resources consumption of the solving technique, and a rigid model (modifying the model requires deriving a new sampler).

A starting point of the internship will be to adapt the model of TAMM to solve it using machine learning techniques and to evaluate it (from the efficiency and accuracy points of view). Then, the implemented model will be applied to extract temporal patient phenotypes from the APHP Covid-19 cohort data and contribute to 1) describing Covid-19 patients, possibly by criticality group, and 2) describing hospitalizations by conditions (comparison of new and historical ICUs). A secondary objective is to investigate the possibility of using these models to create discriminant temporal phenotypes, i.e. phenotypes that would occur more likely in a group of patients than in the others.

Profil du candidat :
* You are enthusiastic about research, you love to understand in depth the problems and to find them elegant solutions.
* You have an strong background in math and computer science (Python for machine learning environment).
* You are interested in artificial intelligence and, more precisely, in machine learning, optimization techniques, data analysis, …
* You have interest in the field of health and to contribute to the development of solutions that may help clinicians or epidemiologists.
* You speak and write English and/or French.

Formation et compétences requises :
* You are student in a Master 2 in computer science, data science or statistics, or student in a engineering school.

Adresse d’emploi :
* Location: Lyon (or possibly Paris). The intern will be hosted at Inria Lyon located on the Doua scientific campus, at Villeurbanne. Some meeting will be organized in Paris.
* Data access is secured
* application by mail with CV, motivation letter, transcripts
* Start date between february to may (4 to 6 months)

Document attaché : 202111151133_sujet_APHP.pdf