Temporal models of care sequences for the exploration of medico-administrative data

When:
30/05/2018 – 31/05/2018 all-day
2018-05-30T02:00:00+02:00
2018-05-31T02:00:00+02:00

Annonce en lien avec l’Action/le Réseau : aucun

Laboratoire/Entreprise : IRISA/CHU Rennes
Durée : 36 mois
Contact : thomas.guyet@irisa.fr
Date limite de publication : 2018-05-30

Contexte :
Pharmacoepidemiology is the study of the use of drugs under real conditions. Ongoing opening of access of the medico-administrative databases is a scientific breakthrough in this medical research field. Medico-administrative databases contain data collected for administrative purposes. The French SNDS 1 (previously SNIIRAM) is the world largest medico-administrative database with a coverage close to 99% of the population. This makes this database a real treasure for both epidemiologists and data scientists. These data (drug deliveries, medical consultations, hospitalization, in fact all health care services reimbursed by Health Insurance) constitute a wealth of readily available information. Its use in pharmacoepidemiology responds to the need for rapid answers to public health questions. However, addressing both the quantity and complexity of health data remains an open challenge.

The difficulty of analyzing medico-administrative data is the semantic gap between the raw data (for example, database record about the delivery at date t of drug with ATC 2 code N 02BE01) and the nature of the events sought by clinicians (“was the patient exposed to a daily dose of paracetamol higher than 3g?”). The solution that is used by epidemiologists consists in enriching the data with new types of events that, on the one side, could be generated from raw data and on the other side, have a medical interpretation. Such new abstract events are defined by clinician using proxies. For example, drugs deliveries can be translated in periods of drug exposure (drug exposure is a time-dependent variable for non-random reasons) or identify patient stages of illness, etc. A proxy can be seen as an abstract description of a care sequence.

Sujet :
Currently, the clinicians are limited in the expression of these proxies bothby the coarse expressivity of their tool and by the need to process efficiently large amount of data. [6] From a semantic point of view, care sequences must fully integrate the temporal and taxonomic dimensions of the data to provide significant expression power. From a computational point of view, the methods employed must make it possible to efficiently handle large amounts of data (several millions care pathways).

The aim of this thesis is to study temporal models of sequences in order 1) to show their abilities to specify complex proxies representing care sequences needed in pharmaco-epidemiological studies and 2) to build an efficient querying tool able to exploit large amount of care pathways.

In previous works, we focused on the chronicle model [5, 7] which represents a care sequence as a set of events for which numerical constraints are added on the delay between their occurrences. One advantage of this simple model is that it could be easily visualized by
clinicians. In addition, it is effective for querying large masses of sequences but shows limits in its expressiveness (especially on taxonomies or the expression of disjunctions). Other models of behavior have been proposed with different time models coming from various
communities (e.g. logic [2, 8], discrete event systems [3, 9] or automatic verification [1]).

Each of these representations therefore offers higher semantic power but also computational limits (decidability, efficiency, etc).
This thesis will contribute to the PEPS plateform [4] developed in collaboration by IRISA and REPERES. Querying tools based on temporal models will be deployed and evaluated on real pharmacoepidemiological studies in close relationship with epidemiologists. Model expressivity will be evaluated according to the practical needs of clinicians both from theoretical and practical points of view.

The main stages of the PhD thesis will be: 1) state of the art, discovery of the SNDS and pharmaco-epidemiology, 2) identify potential models of care sequences and selection of 2 to 4 typical pharmacoepidemiology studies to reproduce, 3) implement, evaluate and compare temporal models and 4) valorize the work through studies and publications.

Profil du candidat :
• preferably student preparing or having MSc diploma (master 2) within one of this specialities:
– MSc Diploma in theoretical computer science (algorithmics, logic or formal models, data science, artificial intelligence) with strong interest in medical application and abilities to work in this application field
– MSc Diploma in (bio)medical informatics with good backgrounds in computer science
• good abilities to work in a multidisciplinary environment
• good communication skills in English (oral and written)
• autonomy and motivation for research

Formation et compétences requises :
• preferably student preparing or having MSc diploma (master 2) within one of this specialities:
– MSc Diploma in theoretical computer science (algorithmics, logic or formal models, data science, artificial intelligence) with strong interest in medical application and abilities to work in this application field
– MSc Diploma in (bio)medical informatics with good backgrounds in computer science

Adresse d’emploi :
IRISA
Campus de Beaulieu
35042 Rennes
FRANCE

Document attaché : phd_temporalmodels_IRISA_REPERES.pdf