Stage M2 -- Privacy attacks on synthetic data generation (2026-03-01)

When:

01/03/2026 – 02/03/2026 all-day

2026-03-01T01:00:00+01:00

2026-03-02T01:00:00+01:00

Offre en lien avec l’Action/le Réseau : – — –/– — –

Laboratoire/Entreprise : Laboratoire IRISA
Durée : 5-6 mois
Contact : tristan.allard@irisa.fr
Date limite de publication : 2026-03-01

Contexte :
Health data, social networks, electricity consumption… Vast quantities of personal data are collected today by private companies or public organizations. Various legal, monetary, or visibility incentives push data holders to envision sharing versions of the collected datasets that provide both statistical utility and privacy guarantees. Indeed, sharing data at large, e.g., as open data, without jeopardizing privacy, is expected to bring strong benefits (strengthening, e.g., scientific studies, innovation, public policies). Synthetic data generation is a promising approach. First, synthetic data generation algorithms aim at generating datasets that are as close as possible to the original datasets. Either synthetically generated data or the generative models trained over the original data could be shared for supporting elaborate data analysis. Second, substantial progress has been made during the last decade about the privacy guarantees of synthetic data generation algorithms. For example, there exist today synthetic data generation algorithms that satisfy variants of differential privacy, one of the most prominent family of privacy models. However, the wealth of generative algorithms, of privacy models and algorithms, and of parameters makes it hard for non expert users to understand clearly the privacy implications of any given choice. Given the growing number of privacy attacks on machine learning models and especially on generative algorithms, an inappropriate choice can result in catastrophic consequences.

Sujet :
The main goal of this M2 thesis is to design an efficient approach for allowing a data holder to compute the most relevant privacy attacks given the data holder’s choice.

The main tasks of the Master student will be to:
• Study the state-of-the-art about privacy attacks (e.g., membership inference attacks [2, 4, 5]). We will focus on tabular data.
• Formalize the attackers (e.g., adversarial goals, background knowledge, impacts and costs of the attacks, vulnerable algorithms), structure the space of attackers (e.g., generalization/specialization of attackers, implications), and explore efficiently the resulting space for finding the attacks that best illustrate the privacy risks.
• Implement the approach and evaluate its performance.

In addition to the core tasks of the project, the successful candidate will also contribute to the organisation of competitions where the privacy guarantees of synthetic data generation algorithms are challenged.

Profil du candidat :
• The candidate must be in the second year of a master’s degree, or equivalent, in computer science or in a related field.
• The candidate must be curious, autonomous, and rigorous.
• The candidate must be able to communicate in English (oral and written). The knowledge of the French language is not required.
• The candidate must have a strong interest in cybersecurity.
• Skills in machine learning will be appreciated.

Formation et compétences requises :

Adresse d’emploi :
Campus de Beaulieu IRISA/Inria Rennes
263 avenue du Général Leclerc
35042 RENNES cedex

Document attaché : 202511171626_m2-attacks-25_26.pdf

MaDICS

Masses de Données, Informations et Connaissances en Sciences

Big Data - Data Science

Stage M2 — Privacy attacks on synthetic data generation