Mobile: +61 (0) 414 488 851
Email: swilson@lockstep.com.au

Is it Personal Information or not ? Embrace the uncertainty.

The Australian parliament recently revised our definition of Personal Information (or, roughly equivalently, what Americans usually call Personally Identifiable Information, or PII). We have lowered the bar somewhat, with a new regime that will categorise even more examples of data as Personal Information (PI). And this has triggered fresh anxieties amongst security practitioners about different interpretations of the law. But I like to think the more liberal definition provides an opportunity for security professionals to actually embrace privacy practice, precisely because, more than ever, privacy management is about uncertainty and risk.

It's a timely regulatory development too with the present debates over telecommunications surveillance metadata. Our broader new definition of Personal Information probably encompasses a great deal of the items law enforcement are interested in, such as phone numbers and IP addresses. I've noted elsewhere that if politicians genuinely want to convene a public dialogue about what they claim is a necessary re-balancing of security and privacy, then we have a mature legislated framework with which to structure that conversation.

Since 1988, the Australian Privacy Act has defined Personal Information as:

  • information or an opinion ... whether true or not, and whether recorded in a material form or not, about an individual whose identity is apparent, or can reasonably be ascertained, from the information or opinion (underline added).

The Privacy Amendment (Enhancing Privacy Protection) Act 2012 says that PI is:

  • information or an opinion about an identified individual, or an individual who is reasonably identifiable: (a) whether the information or opinion is true or not; and (b) whether the information or opinion is recorded in a material form or not (underline added).

What matters for the present discussion is that the amendments remove the previous condition that identification of the individual be done from the Personal Information itself. So under the new definition, we are required to consider data as PI if there is a reasonable likelihood that it may be identified in future by any means!

Note that much of what follows applies equally to some US definitions of "PII". The US General Services Administration defines Personally Identifiable Information as information that can be used to distinguish or trace an individual’s identity, either alone or when combined with other personal or identifying information that is linked or linkable to a specific individual (underline added). PII need not uniquely identify a person, it merely needs to be identifiable with a person. The GSA definition means that items of data can constitute PII if other data can be combined to identify the person concerned. The fragments are regarded as PII even if it is the whole that does the identifying (but to complicate matters, there are other definitions of "PII" in the US that are more about uniquely identifying).

I like the more conservative emphasis on identifiable information. It encourages people to protect identifiable data before it is actually identified. Privacy protections should be planned in anticipation of identification and not as a reaction.

On more than one occasion in the last few weeks, I've been asked - semi-rhetorically - if the new Australian definition means that even an IP or MAC address nowadays could count as "Personal Information". And I've said that in short, yes it appears so!

Some security people are uncomfortable with this, but why? When it comes down to it, what is so worrying about having to take care of Personal Information? In Australia and in 100-odd other jurisdictions with OECD based data protection laws, it means that data custodians are required to handle their information assets in accordance with certain sets of Privacy Principles. This is not trivial but neither is it necessarily onerous. If the obligations in the Privacy Principles are examined in a timely manner, alongside compliance, security and information management, then they can be accommodated as just another facet of organisational hygiene.

So for instance, consider a large data base of 'anonymous' records indexed by MAC address. This is just the sort of data that's being collected by retailers with in-store cell phone tracking systems, and used to study how customers move through the facility and interact with stores and merchandise. Strictly speaking, if the records are not identifiable then they are not PII and data protection laws do not apply. But the new definition of Personal Information in Australia means IT designers need to consider the prospect of the records becoming identifiable in the event that another data set comes into play. And why not? If anonymous data becomes identified then the data custodian will suddenly find themselves in scope for privacy laws, so it's prudent to plan for that scenario now. Depending on the custodian's risk appetite, any large potentially identifiable data set should be managed with regard to Privacy Principles. These would dictate that the collection of records should be limited to what's required for clear business purposes; that records collected for one purpose not be casually used for other unrelated purposes; and that the organisation be open about what data it collects and why. These sorts of measures are really pretty sensible.

Security practitioners I've spoken with about PI and identifiability are also upset about the ambiguity in the definition of Personal Information, and that classic bit of qualified legalese: "reasonably" identifiable. They complain that the identifiability of a piece of data is relative and fluid, and they don't want to have to interpret the legal definition. But I'm struck here by an inconsistency, because security management is all about uncertainty.

Yes, identifiability changes over time and in response to organisational developments. But security professionals and ICT architects should treat the future identification of a piece of unnamed data as just another sort of threat. The probability that data becomes identifiable depends on a range of variables that are a lot like other factors (like the emergence of other data, changes of circumstance, or developments in data analysis) that are routinely evaluated during risk assessment.

To deal with identifiability and the classification of data as PI or not PI, you should look at the following:

  • consider the broad context of your data assets, how they are used, and how they are linked to other data sets
  • think about how your data assets might grow and evolve over time
  • look at business pressures and plans to expand the detail and value of data
  • make assumptions, and document them, as you do with any business analysis
  • and plan to review periodically.

Many organisations maintain a formal Information Assets Inventory and/or an Information Classification regime, and these happen to be ideal management mechanisms in which to classify data as PI or not PI. That decision should be made against the backdrop of the organisation's risk appetite. How conservative or adventurous are in you respect of other risks? If you happen to mis-classify Personal Information, what could be the consequences, and how would the organisation respond? Do some scenario planning, and involve legal, risk and compliance. While you're at it, take the chance to raise awareness outside IT of how information is managed.

Be prepared to review and change your classifications from non-PI to PI over time. Remember that security managers should always be prepared for change. Embrace the uncertainty in Personal Information!

Truly, privacy can be tackled by IT professionals in much the same way as security. There are no certainties in security and it's the same in privacy. We will never have perfect privacy; rather, privacy management is really about putting reasonable arrangements in place for controlling the flow of Personal Information.

So, if something that's anonymous today, might be identified later, you're going have to deal with that eventually. Why not start the planning now, treat identifiability as just another threat, and roll your privacy and security management together?

Update 30 Sept 2013

I've just come across the 2010 iapp essay The changing meaning of 'personal data' by William B. Baker and Anthony Matyjaszewski. I think it's an excellent survey of the issues; it's very valuable for its span across dozens of different international jurisdictions. And it's accessible to non-lawyers.

The essay looks specifically at the question of whether IP addresses can be PII, and highlights a trend in the US towards conceding that IP addresses combined with other data cane identify, and may therefore count as PII:

Privacy regulators in the European Union regard dynamic IP addresses as personal information. Even though dynamic IP addresses change over time, and cannot be directly used to identify an individual, the EU Article 29 Working Party believes that a copyright holder using "reasonable means" can obtain a user's identity from an IP address when pursuing abusers of intellectual property rights. More recently, other European privacy regulators have voiced similar views regarding permanent IP addresses, noting that they can be used to track and, eventually, identify individuals.
This contrasts sharply to the approach taken in the United States under laws such as COPPA where, a decade ago, the FTC considered whether to classify even static IP addresses as personal information but ultimately rejected the idea out of concern that it would unnecessarily increase the scope of the law. In the past few years, however, the FTC has begun to suggest that IP addresses should be considered PII for much the same reasons as their European counterparts. Indeed, in a recent consent decree, the FTC included within the definition of "nonpublic, individually-identifiable information" an “IP address (or other "persistent identifier")." And the HIPAA Privacy Rule treats IP addresses as a form of "protected health information" by listing them as a type of data that must be removed from PHI for deidentification purposes.

Baker & Matyjaszewski stress that "foreseeability [of re-identification] may simply be a function of one's ingenuity". And indeed it is. But I would reiterate that foreseeing of all sorts of potential adverse events is exactly what security professionals do, day in, day out. Nobody really knows what the chances are of a web site being hacked, or of a trusted employee going feral, but risk assessments routinely involve us gauging these eventualities, which we do by making assumptions, writing them down, drawing conclusions, and reviewing them from time to time. Privacy threats are no different, including the uncertainty about whether data may one day be rendered identifiable.

Posted in Security, Privacy

Post a comment

If you are a registered user, Please click here to Sign In

Your Name*

Your Email Address* required, but won't be displayed on this site

To help prevent spam in our blog comments, please type in "the" (without the quotation marks) below*