How are we to defend anonymity?

If anonymity is important, what is the legal basis for defending it?
I find that conventional data privacy law in most places around the world already protects anonymity, insofar as the act of de-anonymization represents an act of personal data collection – the creation of a named record. As such, de-anonymization cannot be lawfully performed without an express need to to do, or consent.

Foreword

Cynics have been asking the same rhetorical question “is privacy dead?” for at least 40 years. Certainly information technology and ubiquitous connectivity have made it nearly impossible to hide, and so anonymity is critically ill. But privacy is not the same thing as secrecy; privacy is a state where those who know us, respect the knowledge they have about us. Privacy generally doesn’t require us hiding from anyone; it requires restraint on the part of those who hold Personal Information about us.

The typical public response to data breaches, government surveillance and invasions like social media facial recognition is vociferous. People in general energetically assert their rights to not be tracked online, or to have their personal information exploited behind their backs. These reactions show that the idea of privacy alive and well.

The end of anonymity perhaps

Against a backdrop of spying revelations and excesses by social media companies especially in regards to facial recognition, there have been recent calls for a “new jurisprudence of anonymity”; see Yale law professor Jed Rubenfeld writing in the Washington Post of 13 Jan 2014. I wonder if there is another way to crack the nut? Because any new jurisprudence is going to take a very long time.

Instead, I suggest we leverage the way most international privacy law and privacy experience — going back decades — is technology neutral with regards to the method of collection. In some jurisdictions like Australia, the term “collection” is not even defined in privacy law. Instead, the law simply uses the normal plain English sense of the word, when it frames principles like Collection Limitation: basically, you are not allowed to collect (by any means) personal data without a good reasonable express reason. It means that if personal data gets into a data system, the system is accountable under privacy law for that data, no matter how it got there.

This technology neutral view of personal data collection has satisfying ramifications for all the people who intuit that Big Data has got too “creepy”. We can argue that if a named record is produced afresh by a Big Data process (especially if that record is produced without the named person being aware of it, and from raw data that was originally collected for some other purpose) then that record has logically been collected. Whether personal data is collected directly by questionnaire or indirectly by algorithm — or created by an obscure process — privacy law is largely agnostic.

I suggest that the output of the data mining, if it is personally identifiable and especially if it has been rendered identifiable by processing previously anonymous raw data, has is a fresh collection by the mining operation. As such, the miners should be accountable for their newly minted personal data, just as though they had collected gathered it directly from the persons concerned.

For now, I don’t want to go further and argue the rights and wrongs of surveillance. I just want to show a new way to frame the privacy questions in surveillance and big data, making use of existing jurisprudence. If I am right and the NSA is in effect collecting personal data as it goes about its data mining, then that provides a possibly fresh understanding of what’s going on, within which we can objectively analyse the rights and wrongs.

I am actually the first to admit that within this frame, the NSA might still be justified in mining data, and there might be no actual technical breach of information privacy law, if for instance the NSA enjoys a law enforcement exemption. These are important questions that need to be debated, but elsewhere (see my recent blog on our preparedness to actually have such a debate).My purpose right now is to frame a way to defend anonymity using as much existing legal infrastructure as possible.

But Collection is not limited everywhere

There is an important legal-technical question in all this: how is the collection of personal data actually regulated? In Europe, Australia, New Zealand and in dozens of countries, collection is limited, but in the U.S.A. there is no general restriction against collecting personal data. America has no broad data protection law, and in any case, some sets of Fair Information Practice Principles (FIPPs) don’t even feature Collection Limitation!

So there may be few regulations in the U.S. that would carry my argument there! Nevertheless, surely we can use international jurisprudence in Collection Limitation instead of creating new American jurisprudence around anonymity?

So I’d like to put the following questions Jed Rubenfeld:

Do technology neutral Collection Limitation Principles in theory provide a way to bring de-anonymised data into scope for data privacy laws? Is this a way to address peoples’ concerns with Big Data?
How does international jurisprudence around Collection Limitation translate to American schools of legal thought?
Does this way of looking at the problem create new impetus for Collection Limitation to be introduced into American privacy principles, especially the FIPPs?

Appendix: “Applying Information Privacy Norms to Re-Identification”

In 2013 I presented some of these ideas to an online symposium at the Harvard Law School Petrie-Flom Center, on the Law, Ethics & Science of Re-identification Demonstrations. What follows is an extract from that presentation, in which I spell out carefully the argument — which was not obvious to some at the time — that when genetics researchers combine different data sets to demonstrate re-identification of donated genomic material, they are in effect collecting patient personal data. I argue that this type of collection should be subject to ethics committee approval just as if the researchers were collecting the identities from the patients directly.

… I am aware of two distinct re-identification demonstrations that have raised awareness of the issues recently. In the first, Yaniv Erlich [at MIT’s Whitehead Institute] used what I understand are new statistical techniques to re-identify a number of subjects that had donated genetic material anonymously to the 1000 Genomes project. He did this by correlating genes in the published anonymous samples with genes in named samples available from genealogical databases. The 1000 Genomes consent form reassured participants that re-identification would be “very hard”. In the second notable demo, Latanya Sweeney re-identified volunteers in the Personal Genome Project using her previously published method of using a few demographic values (such as date or birth, sex and postal code) extracted from the otherwise anonymous records.

A great deal of the debate around these cases has focused on the consent forms and the research subjects’ expectations of anonymity. These are important matters for sure, yet for me the ethical issue in de-anonymisation demonstrations is more about the obligations of third parties doing the identification who had nothing to do with the original informed consent arrangements. The act of recording a person’s name against erstwhile anonymous data represents a collection of personal information. The implications for genomic data re-identification are clear.

Let’s consider Subject S who donates her DNA, ostensibly anonymously, to a Researcher R1, under some consent arrangement which concedes there is a possibility that S will be re-identified. And indeed, some time later, an independent researcher R2 does identify S and links her to the DNA sample. The fact is that R2 has collected personal information about S. If R2 has no relationship with S, then S has not consented to this new collection of her personal information.

Even if the consent form signed at the time of the original collection includes a disclaimer that absolute anonymity cannot be guaranteed, re-identifying the DNA sample later represents a new collection, one that has been undertaken without any consent. Given that S has no knowledge of R2, there can be no implied consent in her original understanding with R1, even if absolute anonymity was disclaimed.

Naturally the re-identification demonstrations have served a purpose. It is undoubtedly important that the limits of anonymity be properly understood, and the work of Yaniv and Latanya contribute to that .Nevertheless, these demonstrations were undertaken without the knowledge much less the consent of the individuals concerned. I contend that bioinformaticians using clever techniques to attach names to anonymous samples need ethics approval, just as they would if they were taking fresh samples from the people concerned.