The Snapchat data breach

Yesterday it was reported by The Verge that anonymous hackers have accessed Snapchat’s user database and posted 4.6 million user names and phone numbers. In an apparent effort to soften the blow, two digits of the phone numbers were redacted. So we might assume this is a “white hat” exercise, designed to shame Snapchat into improving their security. Indeed, a few days ago Snapchat themselves said they had been warned of vulnerabilities in their APIs that would allow a mass upload of user records.

The response of many has been, well, so what? Some people have casually likened Snapchat’s list to a public White Pages; others have played it down as “just email addresses”.

Let’s look more closely. The leaked list was not in fact public names and phone numbers; it was user names and phone numbers. User names might often be email addresses but these are typically aliases; people frequently choose email addresses that reveal little or nothing of their real world identity. We should assume there is intent in an obscure email address for the individual to remain secret.

Identity theft has become a highly organised criminal enterprise. Crime gangs patiently acquire multiple data sets over many months, sometimes years, gradually piecing together detailed personal profiles. It’s been shown time and time again by privacy researchers (perhaps most notably Latanya Sweeney) that re-identification is enabled by linking diverse data sets. And for this purpose, email addresses and phone numbers are superbly valuable indices for correlating an individual’s various records. Your email address is common across most of your social media registrations. And your phone number allows your real name and street address to be looked up from reverse White Pages. So the Snapchat breach could be used to join aliases or email addresses to real names and addresses via the phone numbers. For a social engineering attack on a call centre — or even to open a new bank account — an identity thief can go an awful long way with real name, street address, email address and phone number.

I was asked in an interview to compare the theft of stolen phone numbers with social security numbers. I surprised the interviewer when I said phone numbers are probably even more valuable to the highly organised ID thief, for they can be used to index names in public directories, and to link different data sets, in ways that SSNs (or credit card numbers for that matter) cannot.

So let us start to treat all personal inormation — especially when aggregated in bulk — more seriously! And let’s be more cautious in the way we categorise personal or Personally Identifiable Information (PII).

Importantly, most regulatory definitions of PII already embody the proper degree of caution. Look carefully at the US government definition of Personally Identifiable Information:

information that can be used to distinguish or trace an individual’s identity, either alone or when combined with other personal or identifying information that is linked or linkable to a specific individual (underline added).

This means that items of data can constitute PII if other data can be combined to identify the person concerned. That is, the fragments are regarded as PII even if it is the whole that does the identifying.

And remember that the middle I in PII stands for Identifiable, and not, as many people presume, Identifying. To meet the definition of PII, data need not uniquely identify a person, it merely needs to be directly or indirectly identifiable with a person. And this is how it should be when we heed the way information technologies enable identification through linkages.

Almost anywhere else in the world, data stores like Snapchat’s would automatically fall under data protection and information privacy laws; regulators would take a close look at whether the company had complied with the OECD Privacy Principles, and whether Snapchat’s security measures were fit for purpose given the PII concerned. But in the USA, companies and commentators alike still have trouble working out how serious these breaches are. Each new breach is treated in an ad hoc manner, often with people finessing the difference between credit card numbers — as in the recent Target breach — and “mere” email addresses like those in the Snapchat and Epsilon episodes.

Surely the time has come to simply give proper regulatory protection to all PII.