Written Evidence submitted by Yves-Alexandre de Montjoye and Andrea Gadotti (Computational Privacy Group, Imperial College London) (DDA0016)
Yves-Alexandre de Montjoye
Associate Professor, Department of Computing and Data Science Institute, Imperial College London
Head, Computational Privacy Group
Director, Algorithmic Society Lab
Andrea Gadotti
Doctoral researcher, Department of Computing (Computational Privacy Group), Imperial College London
The Computational Privacy Group at Imperial College studies the privacy risks arising from large-scale datasets. We develop attack models to test mechanisms and design solutions to collect and use data safely.
Summary:
This submission focuses on paragraph 123 on anonymisation in the UK Government's consultation Data: A new direction and on the role of modern privacy engineering technologies for the safe use and sharing of data for research.
Key points include:
● Paragraph 123 of the Government’s consultation proposes that data may be considered anonymous whenever it is not identifiable by the controller who processes it.
● This might be interpreted to suggest that even if data were to be highly identifiable in the hands of someone other than the controller, such data would fall outside the scope of the GDPR.
● Such an interpretation would be likely to substantially lower the technical standards for anonymisation.
● Lowering the technical standards on anonymisation would be both unnecessary and counterproductive. Modern privacy engineering can be used to collect, share, and analyse data while providing strong privacy guarantees. Lower standards on anonymisation would weaken the incentives to their development and adoption and create significant risks to the privacy of individuals.
Anonymisation standards and privacy engineering for data sharing
○ Query-based systems, which allow controllers to store datasets safely behind their servers without sharing the raw data with researchers, but allowing them to send queries on the data and receive only answers aggregated over many users. These systems typically support strong security measures (e.g. authentication, activity logging, and others) that can reduce even further the chances of privacy violations. Several solutions already exist, such as OPAL[3] — developed by our group at Imperial, together with MIT, Orange, and other partners — and OpenSafely by the University of Oxford.
○ Differential privacy, a framework of mechanisms to analyse data which provide strong mathematical guarantees of privacy. These can be particularly useful to share aggregate data broadly for transparency and reporting purposes[4].
○ Synthetic data, which can be mostly useful to share broadly datasets which “look like” the original one in order for researchers to run tests and simulate analysis (which can then be run on the real data in a more controlled environment). Importantly, while synthetic data can help test hypotheses, these have then always to be validated against the real data.
○ The anonymisation standards required by the GDPR should be preserved.
○ The Government and authorities such as the ICO should discourage the use of weak anonymisation techniques and instead encourage the use of modern privacy engineering technologies. This includes providing guidance on how anonymisation techniques can and cannot be employed to meet the intent and standards of the law.
January 2022
[1] Rocher, L., Hendrickx, J.M., de Montjoye, Y.-A., 2019. Estimating the success of re-identifications in incomplete datasets using generative models. Nature Communications 10, 1–9. https://doi.org/10.1038/s41467-019-10933-3
[2] Available at https://cpg.doc.ic.ac.uk/observatory/
[3] Oehmichen, A., Jain, S., Gadotti, A., de Montjoye, Y.-A., 2019. OPAL: High performance platform for large-scale privacy-preserving location data analytics, in: 2019 IEEE International Conference on Big Data (Big Data). Presented at the 2019 IEEE International Conference on Big Data (Big Data), pp. 1332–1342. https://doi.org/10.1109/BigData47090.2019.9006389
[4] The practical guarantees of differential privacy mechanisms should be however evaluated carefully based on the context, see for example our recent article: Houssiau, F., Rocher, L., de Montjoye, Y.-A., 2022. On the difficulty of achieving Differential Privacy in practice: user-level guarantees in aggregate location data. Nat Commun 13, 29. https://doi.org/10.1038/s41467-021-27566-0