Letter to Science: Re-identification of DNA may need ethics approval

Introduction

I had a letter published in Science magazine about the recently publicised re-identification of anonymously donated DNA data. It has been shown that there is enough named genetic information online, in genealogical databases for instance, that anonymous DNA posted in research databases can be re-identified. This is a sobering result, but does it mean that ‘privacy is dead’? Not at all.

The fact is that re-identification of erstwhile anonymous data represents a new act of collection of PII and as such is subject to the Collection Limitation Principle in privacy law around the world. This type of data processing is essentially the same as Facebook using biometric facial recognition to identify people in anonymous photos. European regulators recently found Facebook to have breached privacy law with its Tag Suggestions function and have forced Facebook to shut down their facial recognition feature.

I expect that the same legal principles to apply to the re-identification of DNA. That is, there are legal constraints on what can be done with ‘anonymous’ data no matter where you get it from: under most data privacy laws, attaching names to previously anonymous data constitutes a Collection of PII, and as such, is subject to consent rules and all sorts of other principles. As a result, bioinformatics researchers will have to tread carefully, justifying their ends and their means before ethics committees. And corporations who seek to exploit the ability to put names on anonymous genetic data may face the force of the law as Facebook did.

The text of my letter to Science follows, and after that, I’ll keep posting follow ups.

Legal Limits to Data Re-Identification

Science 8 February 2013:
Vol. 339 no. 6120 pp. 647

Yaniv Erlich at the Whitehead Institute for Biomedical Research used his hacking skills to decipher the names of anonymous DNA donors (“Genealogy databases enable naming of anonymous DNA donor,” J. Bohannon, 18 January, p. 262). A little-known legal technicality in international data privacy laws could curb the privacy threats of reverse identification from genomes. “Personal information” is usually defined as any data relating to an individual whose identity is readily apparent from the data. The OECD Privacy Principles are enacted in over 80 countries worldwide [1]. Privacy Principle No. 1 states: “There should be limits to the collection of personal data and any such data should be obtained by lawful and fair means and, where appropriate, with the knowledge or consent of the data subject. “The principle is neutral regarding the manner of collection. Personal information may be collected directly from an individual or indirectly from third parties, or it may be synthesized from other sources, as with “data mining.”

Computer scientists and engineers often don’t know that recording a person’s name against erstwhile anonymous data is technically an act of collection. Even if the consent form signed at the time of the original collection includes a disclaimer that absolute anonymity cannot be guaranteed, re-identifying the information later signifies a new collection. The new collection of personal information requires its own consent; the original disclaimer does not apply when third parties take data and process it beyond the original purpose for collection. Educating those with this capability about the legal meaning of collection should restrain the misuse of DNA data, at least in those jurisdictions that strive to enforce the OECD principles.

It also implies that bioinformaticians working “with little more than the Internet” to attach names to samples may need ethics approval, just as they would if they were taking fresh samples from the people concerned.

Stephen Wilson
Lockstep Consulting Pty Ltd
Five Dock Sydney, NSW 2046, Australia.
E-mail: swilson@lockstep.com.au

Recap

Let’s assume Subject S donates their DNA, ostensibly anonymously, to a Researcher R1, under some consent arrangement which concedes there is a possibility that S will be re-identified.And indeed, some time later, an independent researcher R2 does identify S as belonging to the DNA sample.The fact that many commentators seem oblivious to is this: R2 has Collected Personal Information (or PII) about S.*If R2 has no relationship with S, then S has not consented to this new collection of her PII.In jurisdictions with strict Collection Limitation (like the EU, Australia and elsewhere) then it is a privacy breach for R2 to collect PII by way of DNA re-identification without express consent, regardless of whether R1 has conceded to S that it might happen.Even in the US, where the protections might not be so strict, there remains a question of ethics: should R2 conduct themselves in a manner that might be unlawful in other places?

* Footnote: The collection by R2 of PII about S in this case is indirect (or algorithmic), but nevertheless is a fresh collection. Logically, if anonymous data is converted into identifiable data, then PII has been collected. In most cases, the Collection Limitation principle in privacy is technology neutral. Privacy laws do not generally care how PII is collected; if PII is created by some process (such as re-identification) then privacy principles still apply. And this is what consumers would expect. If a Big Data process allows an organisation to work out insights about people without having to ask them explicit questions (which is precisely why Big Data is so valuable to business and governments), then the individuals concerned should expect that privacy principles still apply.

Follow up

In an interview with Science Magazine on Jan 18, the Whitehead Institute’s Melissa Gymrek discussed the re-identification methods, and the potential to protect against them. She concluded: “I think we really need to learn to deal with the fact that we cannot ever make data sets truly anonymous, and that I think the key will be in regulating how we are allowed to use this genetic data to prevent it from being used maliciously.”.

I agree completely. We need regulations. Elsewhere I’ve argued that anonymity is an inadequate way to protect privacy, and that we need a balance of regulations and Privacy Enhancing Technologies. And it’s for this reason that I am not fatalistic about the fact that anonymity can be broken, because we have (and will always need) the procedural means to see that privacy is still preserved.