Facebook's challenge to the Collection Limitation Principle
An extract from our chapter in the forthcoming Encyclopedia of Social Network Analysis and Mining (to be published by Springer in 2014).
Stephen Wilson, Lockstep Consulting, Sydney, Australia.
Anna Johnston, Salinger Privacy, Sydney, Australia.
- Facebook's business practices pose a risk of non-compliance with the Collection Limitation Principle (OECD Privacy Principle No. 1, and corresponding Australian National Privacy Principles NPP 1.1 through 1.4).
- Privacy problems will likely remain while Facebook's business model remains unsettled, for the business is largely based on collecting and creating as much Personal Information as it can, for subsequent and as yet unspecified monetization.
- If an OSN business doesn't know how it is eventually going to make money from Personal Information, then it has a fundamental difficulty with the Collection Limitation principle.
Facebook is an Internet and societal phenomenon. Launched in 2004, in just a few years it has claimed a significant proportion of the world's population as regular users, becoming by far the most dominant Online Social Network (OSN). With its success has come a good deal of controversy, especially over privacy. Does Facebook herald a true shift in privacy values? Or, despite occasional reckless revelations, are most users no more promiscuous than they were eight years ago? We argue it's too early to draw conclusions about society as a whole from the OSN experience to date. In fact, under laws that currently stand, many OSNs face a number of compliance risks in dozens of jurisdictions.
Over 80 countries worldwide now have enacted data privacy laws, around half of which are based on privacy principles articulated by the OECD. Amongst these are the Collection Limitation Principle which requires businesses to not gather more Personal Information than they need for the tasks at hand, and the Use Limitation Principle which dictates that Personal Information collected for one purpose not be arbitrarily used for others without consent.
Overt collection, covert collection (including generation) and "innovative" secondary use of Personal Information are the lifeblood of Facebook. While Facebook's founder would have us believe that social mores have changed, a clash with orthodox data privacy laws creates challenges for the OSN business model in general.
This article examines a number of areas of privacy compliance risk for Facebook. We focus on how Facebook collects Personal Information indirectly, through the import of members' email address books for "finding friends", and by photo tagging. Taking Australia's National Privacy Principles from the Privacy Act 1988 (Cth) as our guide, we identify a number of potential breaches of privacy law, and issues that may be generalised across all OECD-based privacy environments.
Australian law tends to use the term "Personal Information" rather than "Personally Identifiable Information" although they are essentially synonymous for our purposes.
Terms of reference: OECD Privacy Principles and Australian law
The Organisation for Economic Cooperation and Development has articulated eight privacy principles for helping to protect personal information. The OECD Privacy Principles are as follows:
- 1. Collection Limitation Principle
- 2. Data Quality Principle
- 3. Purpose Specification Principle
- 4. Use Limitation Principle
- 5. Security Safeguards Principle
- 6. Openness Principle
- 7. Individual Participation Principle
- 8. Accountability Principle
Of most interest to us here are principles one and four:
- Collection Limitation Principle: There should be limits to the collection of personal data and any such data should be obtained by lawful and fair means and, where appropriate, with the knowledge or consent of the data subject.
- Use Limitation Principle: Personal data should not be disclosed, made available or otherwise used for purposes other than those specified in accordance with [the Purpose Specification] except with the consent of the data subject, or by the authority of law.
At least 89 counties have some sort of data protection legislation in place [Greenleaf, 2012]. Of these, in excess of 30 jurisdictions have derived their particular privacy regulations from the OECD principles. One example is Australia.
We will use Australia's National Privacy Principles NPPs in the Privacy Act 1988 as our terms of reference for analysing some of Facebook's systemic privacy issues. In Australia, Personal Information is defined as: information or an opinion (including information or an opinion forming part of a database), whether true or not, and whether recorded in a material form or not, about an individual whose identity is apparent, or can reasonably be ascertained, from the information or opinion.
Indirect collection of contacts
One of the most significant collections of Personal Information by Facebook is surely the email address book of those members that elect to have the site help "find friends". This facility provides Facebook with a copy of all contacts from the address book of the member's nominated email account. It's the very first thing that a new user is invited to do when they register. Facebook refer to this as "contact import" in the Data Use Policy (accessed 10 August 2012).
"Find friends" is curtly described as "Search your email for friends already on Facebook". A link labelled "Learn more" in fine print leads to the following additional explanation:
- "Facebook won't share the email addresses you import with anyone, but we will store them on your behalf and may use them later to help others search for people or to generate friend suggestions for you and others. Depending on your email provider, addresses from your contacts list and mail folders may be imported. You should only import contacts from accounts you've set up for personal use." [underline added by us].
Without any further elaboration, new users are invited to enter their email address and password if they have a cloud based email account (such as Hotmail, gmail, Yahoo and the like). These types of services have an API through which any third party application can programmatically access the account, after presenting the user name and password.
It is entirely possible that casual users will not fully comprehend what is happening when they opt in to have Facebook "find friends". Further, there is no indication that, by default, imported contact details are shared with everyone. The underlined text in the passage quoted above shows Facebook reserves the right to use imported contacts to make direct approaches to people who might not even be members.
Importing contacts represents an indirect collection by Facebook of Personal Information of others, without their authorisation or even knowledge. The short explanatory information quoted above is not provided to the individuals whose details are imported and therefore does not constitute a Collection Notice. Furthermore, it leaves the door open for Facebook to use imported contacts for other, unspecified purposes. The Data Use Policy imposes no limitations as to how Facebook may make use of imported contacts.
Privacy harms are possible in social networking if members blur the distinction between work and private lives. Recent research has pointed to the risky use of Facebook by young doctors, involving inappropriate discussion of patients [Moubarak et al, 2010]. Even if doctors are discreet in their online chat, we are concerned that they may run foul of the Find Friends feature exposing their connections to named patients. Doctors on Facebook who happen to have patients in their web mail address books can have associations between individuals and their doctors become public. In mental health, sexual health, family planning, substance abuse and similar sensitive fields, naming patients could be catastrophic for them.
While most healthcare professionals may use a specific workplace email account which would not be amenable to contacts import, many allied health professionals, counselors, specialists and the like run their sole practices as small businesses, and naturally some will use low cost or free cloud-based email services. Note that the substance of a doctor's communications with their patients over web mail is not at issue here. The problem of exposing associations between patients and doctors arises simply from the presence of a name in an address book, even if the email was only ever used for non-clinical purposes such as appointments or marketing.
Photo tagging and biometric facial recognition
One of Facebook's most "innovative" forms of Personal Information Collection would have to be photo tagging and the creation of biometric facial recognition templates.
Photo tagging and "face matching" has been available in social media for some years now. On photo sharing sites such as Picasa, this technology "lets you organize your photos according to the people in them" in the words of the Picasa help pages. But in more complicated OSN settings, biometrics has enormous potential to both enhance the services on offer and to breach privacy.
In thinking about facial recognition, we start once more with the Collection Principle. Importantly, nothing in the Australian Privacy Act circumscribes the manner of collection; no matter how a data custodian comes to be in possession of Personal Information (being essentially any data about a person whose identity is apparent) they may be deemed to have collected it. When one Facebook member tags another in a photo on the site, then the result is that Facebook has overtly but indirectly collected PI about the tagged person.
Facial recognition technologies are deployed within Facebook to allow its servers to automatically make tag suggestions; in our view this process constitutes a new type of Personal Information Collection, on a potentially vast scale.
Biometric facial recognition works by processing image data to extract certain distinguishing features (like the separation of the eyes, nose, ears and so on) and computing a numerical data set known as a template that is highly specific to the face, though not necessarily unique. Facebook's online help indicates that they create templates from multiple tagged photos; if a user removes a tag from one of their photo, that image is not used in the template.
Facebook subsequently makes tag suggestions when a member views photos of their friends. They explain the process thus:
- "We are able to suggest that your friend tag you in a picture by scanning and comparing your friend‘s pictures to information we've put together from the other photos you've been tagged in".
So we see that Facebook must be more or less continuously checking images from members' photo albums against its store of facial recognition templates. When a match is detected, a tag suggestion is generated and logged, ready to be displayed next time the member is online.
What concerns us is that the proactive creation of biometric matches constitutes a new type of PI Collection, for Facebook must be attaching names -- even tentatively, as metadata -- to photos. This is a covert and indirect process.
Photos of anonymous strangers are not Personal Information, but metadata that identifies people in those photos most certainly is. Thus facial recognition is converting hitherto anonymous data -- uploaded in the past for personal reasons unrelated to photo tagging let alone covert identification -- into Personal Information.
Facebook limits the ability to tag photos to members who are friends of the target. This is purportedly a privacy enhancing feature, but unfortunately Facebook has nothing in its Data Use Policy to limit the use of the biometric data compiled through tagging. Restricting tagging to friends is likely to actually benefit Facebook for it reduces the number of specious or mischievous tags, and it probably enhances accuracy by having faces identified only by those who know the individuals.
A fundamental clash with the Collection Limitation Principle
In Australian privacy law, as with the OECD framework, the first and foremost privacy principle concerns Collection. Australia's National Privacy Principle NPP 1 requires that an organisation refrain from collecting Personal Information unless (a) there is a clear need to collect that information; (b) the collection is done by fair means, and (c) the individual concerned is made aware of the collection and the reasons for it.
The core business model of many Online Social Networks is to take advantage of Personal Information, in many and varied ways. From the outset, Facebook founder, Mark Zuckerberg, appears to have been enthusiastic for information built up in his system to be used by others. In 2004, he told a colleague "if you ever need info about anyone at Harvard, just ask" (as reported by Business Insider). Since then, Facebook has experienced a string of privacy controversies, including the "Beacon" sharing feature in 2007, which automatically imported members' activities on external websites and re-posted the information on Facebook for others to see.
Facebook's privacy missteps are characterised by the company using the data it collects in unforeseen and barely disclosed ways. Yet this is surely what Facebook's investors expect the company to be doing: innovating in the commercial exploitation of personal information. The company's huge market valuation derives from a widespread faith in the business community that Facebook will eventually generate huge revenues. An inherent clash with privacy arises from the fact that Facebook is a pure play information company: its only significant asset is the information it holds about its members. There is a market expectation that this asset will be monetized and maximised. Logically, anything that checks the network's flux in Personal Information -- such as the restraints inherent in privacy protection, whether adopted from within or imposed from without -- must affect the company's futures.
Perhaps the toughest privacy dilemma for innovation in commercial Online Social Networking is that these businesses still don't know how they are going to make money from their Personal Information lode. Even if they wanted to, they cannot tell what use they will eventually make of it, and so a fundamental clash with the Collection Limitation Principle remains.
An earlier version of this article was originally published by LexisNexis in the Privacy Law Bulletin (2010).
- Greenleaf G., "Global Data Privacy Laws: 89 Countries, and Accelerating", Privacy Laws & Business International Report, Issue 115, Special Supplement, February 2012 Queen Mary School of Law Legal Studies Research Paper No. 98/2012
- Moubarak G., Guiot A. et al "Facebook activity of residents and fellows and its impact on the doctor--patient relationship" J Med Ethics, 15 December 2010
I've written a new Constellation Research "Quark" Report on the FIDO Alliance ("Fast Identity Online"), a fresh, fast growing consortium working out protocols and standards to connect authentication endpoints to services.
With a degree of clarity that is uncommon in Identity and Access Management (IDAM), FIDO envisages simply "doing for authentication what Ethernet did for networking".
Not quite one year old, 2013, the FIDO Alliance has already grown to nearly 70 members, amongst which are heavyweights like Google, Lenovo, MasterCard, Microsoft and PayPal as well as a dozen biometrics vendors and several global players in the smartcard supply chain.
STOP PRESS! Discover Card joined a few days ago at board level.
FIDO is different. The typical hackneyed IDAM elevator pitch in promises to "fix the password crisis" but usually with unintended impacts on how business is done. Most IDAM initiatives unwittingly convert clear-cut technology problems into open-ended business transformation problems.
In welcome contrast, FIDO’s mission is clear cut: it seeks to make strong authentication interoperable between devices and servers. When users have activated FIDO-compliant endpoints, reliable fine-grained information about the state of authentication becomes readily discoverable by any server, which can then make access control decisions according to its own security policy.
FIDO is not about federation; it's not even about "identity"!
With its focus, pragmatism and critical mass, FIDO is justifiably today’s go-to authentication industry standards effort.
For more detail, please have a look at The FIDO Alliance at the Constellation Research website.
The cover of Newsweek magazine on 27 July 1970 featured an innocent couple being menaced by cameras and microphones and new technologies like computer punch cards and paper tape. The headline hollered “IS PRIVACY DEAD?”.
The same question has been posed every few years ever since.
In 1999, Sun Microsystems boss Scott McNally urged us to “get over” the idea we have “zero privacy”; in 2008, Ed Giorgio from the Office of the US Director of National Intelligence chillingly asserted that “privacy and security are a zero-sum game”; Facebook’s Mark Zuckerberg proclaimed in 2010 that privacy was no longer a “social norm”. And now the scandal around secret surveillance programs like PRISM and the Five Eyes’ related activities looks like another fatal blow to privacy. But the fact that cynics, security zealots and information magnates have been asking the same rhetorical question for over 40 years suggests that the answer is No!
PRISM, as revealed by whistle blower Ed Snowden, is a Top Secret electronic surveillance program of the US National Security Agency (NSA) to monitor communications traversing most of the big Internet properties including, allegedly, Apple, Facebook, Google, Microsoft, Skype, Yahoo and YouTube. Relatedly, intelligence agencies have evidently also been obtaining comprehensive call records from major telephone companies, eavesdropping on international optic fibre cables, and breaking into the cryptography many take for granted online.
In response, forces lined up at tweet speed on both sides of the stereotypical security-privacy divide. The “hawks” say privacy is a luxury in these times of terror, if you've done nothing wrong you have nothing to fear from surveillance, and in any case, much of the citizenry evidently abrogates privacy in the way they take to social networking. On the other side, libertarians claim this indiscriminate surveillance is the stuff of the Stasi, and by destroying civil liberties, we let the terrorists win.
Governments of course are caught in the middle. President Obama defended PRISM on the basis that we cannot have 100% security and 100% privacy. Yet frankly that’s an almost trivial proposition. It's motherhood. And it doesn’t help to inform any measured response to the law enforcement challenge, for we don’t have any tools that would let us design a computer system to an agreed specification in the form of, say “98% Security + 93% Privacy”. It’s silly to us the language of “balance” when we cannot measure the competing interests objectively.
Politicians say we need a community debate over privacy and national security, and they’re right (if not fully conscientious in framing the debate themselves). Are we ready to engage with these issues in earnest? Will libertarians and hawks venture out of their respective corners in good faith, to explore this difficult space?
I suggest one of the difficulties is that all sides tend to confuse privacy for secrecy. They’re not the same thing.
Privacy is a state of affairs where those who have Personal Information (PII) about us are constrained in how they use it. In daily life, we have few absolute secrets, but plenty of personal details. Not many people wish to live their lives underground; on the contrary we actually want to be well known by others, so long as they respect what they know about us. Secrecy is a sufficient but not necessary condition for privacy. Robust privacy regulations mandate strict limits on what PII is collected, how it is used and re-used, and how it is shared.
Therefore I am a privacy optimist. Yes, obviously too much PII has broken the banks in cyberspace, yet it is not necessarily the case that any “genie” is “out of the bottle”.
If PII falls into someone’s hands, privacy and data protection legislation around the world provides strong protection against re-use. For instance, in Australia Google was found to have breached the Privacy Act when its StreetView cars recorded unencrypted Wi-Fi transmissions; the company cooperated in deleting the data concerned. In Europe, Facebook’s generation of tag suggestions without consent by biometric processes was ruled unlawful; regulators there forced Facebook to cease facial recognition and delete all old templates.
We might have a better national security debate if we more carefully distinguished privacy and secrecy.
I see no reason why Big Data should not be a legitimate tool for law enforcement. I have myself seen powerful analytical tools used soon after a terrorist attack to search out patterns in call records in the vicinity to reveal suspects. Until now, there has not been the technological capacity to use these tools pro-actively. But with sufficient smarts, raw data and computing power, it is surely a reasonable proposition that – with proper and transparent safeguards in place – population-wide communications metadata can be screened to reveal organised crimes in the making.
A more sophisticated and transparent government position might ask the public to give up a little secrecy in the interests of national security. The debate should not be polarised around the falsehood that security and privacy are at odds. Instead we should be debating and negotiating appropriate controls around selected metadata to enable effective intelligence gathering while precluding unexpected re-use. If (and only if) credible and verifiable safeguards can be maintained to contain the use and re-use of personal communications data, then so can our privacy.
For me the awful thing about PRISM is not that metadata is being mined; it’s that we weren’t told about it. Good governments should bring the citizenry into their confidence.
Are we prepared to honestly debate some awkward questions?
- Has the world really changed in the past 10 years such that surveillance is more necessary now? Should the traditional balances of societal security and individual liberties enshrined in our traditional legal structures be reviewed for a modern world?
- Has the Internet really changed the risk landscape, or is it just another communications mechanism. Is the Internet properly accommodated by centuries old constitutions?
- How can we have confidence in government authorities to contain their use of communications metadata? Is it possible for trustworthy new safeguards to be designed?
Many years ago, cryptographers adopted a policy of transparency. They have forsaken secret encryption algorithms, so that the maths behind these mission critical mechanisms is exposed to peer review and ongoing scrutiny. Secret algorithms are fragile in the long term because it’s only a matter of time before someone exposes them and weakens their effectiveness. Security professionals have a saying: “There is no security in obscurity”.
For precisely the same reason, we must not have secret government monitoring programs either. If the case is made that surveillance is a necessary evil, then it would actually be in everyone’s interests for governments to run their programs out in the open.
As we head towards 2014, de-identification of personal data sets is going to be a hot issue. I saw several things at last week's Constellation Connected Enterprise conference (CCE) that will make sure of this!
First, recall that in Australia a new definition of Personal Information (PI or "PII") means that anonymous data that can potentially be re-identified in future may have to be classified as PII today. I recently discussed how security and risk practitioners can deal with the uncertainty in re-identifiability.
And there's a barrage of new tracking, profiling and interior geo-location technologies (like Apple's iBeacon) which typically come with a promise of anonymity. See for example Tesco's announcement of face scanning for targeting adverts at their UK petrol stations.
The promise of anonymity is crucial, but it is increasingly hard to keep. Big Data techniques that join de-identified information to other data sets are able to ind correlations and reverse the anonymisation process. The science of re-identification started with the work of Dr Latanya Sweeny who famously identified a former governor and his medical records using zip codes and electoral roll data; more recently we've seen DNA "hackers" who can unmask anonymous DNA donors by joining genomic databases to public family tree information.
At CCE we saw many exciting Big Data developments, which I'll explore in more detail in coming weeks. Business Intelligence as-a-service is expanding rapidly, and is being flipped my innovative vendors to align (whether consciously or not) with customer centric Vendor Relationship Management models of doing business. And there are amazing new tools for enriching unstructured data, like newly launched Paxata's Adaptive Data Preparation Platform. More to come.
With the ability to re-identify data comes Big Responsibilities. I believe that to help businesses meet their privacy promises, we're going to need new tools to measure de-identification and hence gauge the risk of re-identification. It seems that some new generation data analytics products will allow us to run what-if scenarios to help understand the risks.
Just before CCE I also came across some excellent awareness raising materials from Voltage Security in Cupertino. Voltage CTO Terence Spies shared with me his "Deidentification Taxonomy" reproduced here with his kind permission. Voltage are leaders in Format Preserving Encryption and Tokenization -- typically used to hide credit card numbers from thieves in payment systems -- and they're showing how the tools may be used more broadly for de-identifying databases. I like the way Terence has characterised the reversibility (or not) of de-identification approaches, and further broken out various tokenization technologies.
Reference: Voltage Security. Reproduced with permission.
These are the foundations of the important new science of de-identification. Privacy engineers need to work hard at re-identification, so that consumers do not lose faith in the important promises made that so much data collected from their daily movements through cyber space are indeed anonymous.
The Consumerization of Identity: A collision of Worlds
US: Nov 13 1:00-1:30PM Pacific
Aus: Nov 14 8:00-8:30AM AEDST
What happens when the irresistible force of Social Logon hits the immoveable object of enterprise risk management?
People love Social Logon! Twitter, Facebook and Google handles are used everyday to access countless digital services. But they are not yet "business grade". No enterprise should dilute its risk management by accepting social identities willy nilly.
This webinar will unpack what Social Logons tell us about users, compare and contrast Facebook and LinkedIn identities, and try to foresee how they should evolve if they are going to meet business needs.
The webinar will cover:
- What is the Consumerization of IT?
- What is Federated Identity?
- The State of the "identity ecosystem"
- Pros and Cons of Federation
- The Two Dimensions of Social Identities
- What needs to happen for Social Logon to become "Business Grade"?
THE BOTTOM LINE
Consumers’ understanding of their digital identities is evolving fast, while the social Identity Providers (IDPs) continue to explore how to monetize the privileged information that they have about their users. Online consumers have a keen sense of digital reputation and they appreciate how to build a powerful social graph; this awareness will soon inform business-grade identities. But even more importantly, social IDPs must prvide businesses precise information about users to help businesses manage transaction risk. Consumerization opens up exciting new possibilities, but it must not dilute the way businesses know their customers, parts, staff and users.
This is a copy of an op-ed I wrote in IT News on 20 September.
It’s been suggested that with Apple’s introduction of biometric technology, the “i” in iPhone now stands for “identity”. Maybe “i” is for “ironic” because there is another long-awaited feature that would have had much more impact on the device’s identity credentials.
The fingerprint scanner has appeared in the new iPhone 5s, as predicted, and ahead of Near Field Communications capability. In my view, NFC is much more important for identity. NFC is usually thought of as a smartcard emulator, allowing mobile devices to appear to merchant terminals as payments instruments, but the technology has another lesser known mode: reader emulation.
NFC devices can be programmed to interface with any contactless card: smart driver licenses, health cards, employee ID and so on. The power to identify and authenticate to business and enterprise apps using real world credentials would be huge for identity management, but it seems we have to wait.
Meanwhile, what does the world’s instantly most famous fingerprint reader mean for privacy and security? As is the case with all things biometric, the answers are not immediately apparent.
Biometric authentication might appear to go with mobiles like strawberries and cream. Smartphones are an increasingly central fixture in daily life and yet something like 40% of users fail to protect this precious asset with a PIN. So automatic secure logon is an attractive idea.
There are plenty of options for biometrics in smartphones, thanks to the built in camera and other sensors. Android devices have had face unlock for a long time now, and iris authentication is also available. Start-up EyeVerify scans the vein pattern in the whites of the eye; gait recognition has been mooted; and voice recognition would seem an obvious alternative.
With its US$365M acquisition of Authentec in 2012, Apple made a conspicuous commitment to a biometric technology that was always going to involve significant new hardware in the handset. The iPhone 5s incorporates a capacitive fingerprint detector in a subtly modified Home button. Ostensibly the button operates as it always has, but it automatically scans the user’s finger in the time it takes to press and release. Self-enrolment is said to be quite painstaking, with the pad of the finger being comprehensively scanned and memorised. This allows the relatively small scanner to still do its job no matter what fraction of the fingertip happens to be presented. Up to five alternate fingers can be enrolled, which allows for a fall-back if the regular digit is damaged, as well as additional users like family members to be registered.
This much we know. What’s less clear is the security performance of the iPhone 5s.
Remember that all biometrics commit two types of error: False Rejects where an enrolled user is mistakenly blocked, and False Accepts where someone else is confused for the legitimate user. Both type of error are inevitable, because biometrics must be designed to tolerate a little variability. Each time a body part is presented, it will look a little different; fingers get dirty or scarred or old; sensors get scratched; angle and pressure vary. But in allowing for change, the biometric is liable to occasionally think similar people are the same.
The propensity to make either False Positive or False Negative errors must be traded off in every biometric application, to deliver reasonable security and convenience. Data centre biometrics for instance are skewed towards security and as a result can be quite tricky and time consuming to use. With consumer electronics, the biometric trade-off goes very much the other way. Consumers only ever directly experience one type of error – False Rejects – and they can be very frustrating. Most users don’t in fact ever lose their phone, so False Accepts are irrelevant.
Thus the iPhone 5s finger reader will be heavily biased towards convenience, but at what cost? Frustratingly, it is almost impossible to tell. Independent biometrics researchers like Jim Wayman have long warned that lab testing is a very poor predictor of biometric performance in the field. The FBI advises that field performance is always significantly worse than reported by vendors, especially in the face of determined attack.
All we have to go on is anecdotes. We’re assured that the Authentec technology has “liveness detection” to protect against fake fingers but it’s a hollow promise. There are no performance standards or test protocols for verifying the claim of liveness detection.
The other critical promise made by Apple is that the fingerprint templates stored securely with the handset will never made accessible to third party applications nor the cloud. This is a significant privacy measure, and is to be applauded. It’s vital that Apple stick to this policy.
But here’s the rub for identity: if the biometric matching is confined to the phone, then it’s nothing more than a high tech replacement for the PIN, with indeterminate effectiveness. Certainly smartphones have great potential for identity management, but the advantages are to be gained from digital wallets and NFC, not from biometrics.
Some have quipped that the “S” in iPhone 5S stands for “security” but to me it’s more like “speculation”.
IBM Zurich researcher Diego Ortiz-Yepes recently revealed a new Two Factor Authentication technique in which the bona fides of a user of a mobile app are demonstrated via a contactless smartcard waved over the mobile device. The technique leverages NFC -- but as a communications medium, not as a payments protocol. The method appears to be compatible with a variety of smartcards, capable of carrying a key specific to the user and performing some simple cryptographic operations.
This is actually really big.
I hope the significance is not lost in the relentless noise of new security gadget announcements, because it's the most important new approach to authentication we've seen for a long long time. The method can easily be adopted by the NFC and smartcard ecosystems with no hardware changes. And with mobile adoption at a tipping point, we need true advances in security like this to be adopted as widely and as quickly as possible. If we ignore it, future generations will look back on the dawn of m-business as another opportunity lost.
A golden opportunity to address an urgent problem
Mobile represents the first greenfield computing platform in thirty years. Not since the PC have we seen a whole new hardware/software/services/solutions environment emerge.
It's universally acknowledged that general purpose PCs and Internet Protocol for that matter were never engineered with security much in mind. The PC and the Internet were independently architected years before the advent of e-commerce, and without any real sense of the types of mission critical applications they would come to support.
I remember visiting Silicon Valley in 1998 when I was with KPMG's pioneering PKI team, working on, amongst other things, the American Bar Association e-signature guidelines. We were meeting with visionaries, asking Will anyone ever actually shop "online"?. Nobody really knew! But at startling speed, commodity PCs and the Internet were indeed being joined up for shopping and so much more: payments, and e-health, and the most sensitive corporate communications. Yet no mainstream computer manufacturer or standards body ever re-visited their designs with these uses in mind.
And so today, a decade and a half on (or a century in "Internet years") we have security boffins earnestly pronouncing "well you know, there is no identity layer in the Internet". No kidding! Identity theft and fraud are rife, with as yet no industry-wide coordinated response. Phishing and pharming continue at remarkable rates. "Advanced Persistent Threats" (APTs) have been industrialised, through malware exploit kits like Blackhole which even come with licensing deals and help desk support that rivals that of legitimate software companies. Even one of the world's very best security companies, RSA, fell victim to an APT attack that started with an trusted employee opening a piece of spam.
But in the nick of time, along comes the mobile platform, with all the right attributes to make safe the next generation of digital transactions. Most mobile devices come with built-in "Secure Elements": certifiably secure, tamper-resistant chips in which critical cryptographic operations take place. Historically the SIM card (Subscriber Identification Module) has been the main Secure Element; "NFC" technology (Near Field Communications) introduces a new generation of Secure Elements, with vastly more computing power and flexibility than SIMs, including the ability to run mission critical apps entirely within the safe chip.
The Secure Element should be a godsend. It is supported in the NFC architecture by Trusted Service Managers (TSMs) which securely transfer critical data and apps from verified participants (like banks) into the consumers' devices. Technically, the TSMs are a lot like the cell phone personalisation infrastructure that seamlessly governs SIM cards worldwide, and secures mobile billing and roaming. Admittedly, TSMs have been a bit hard to engage with; to date, they're monopolised by telcos that control access to the Secure Elements and have sought to lease memory at exorbitant rates. But if we collectively have the appetite at this time to solve cyberspace security then mobile devices and the NFC architecture in particular provide a once-in-a-generation opportunity. We could properly secure the platform of choice for the foreseeable future.
The IBM Two Factor Authentication demo
Before explaining what IBM's Ortiz-Yepes has done, let's briefly review NFC, because it is often misconstrued. NFC technology has a strong brand that identifies it with contactless payments, but there is much more to it.
"Near Field Communications" is a short range radio frequency comms protocol, suited to automatic device-to-device interfaces. To the layperson, NFC is much the same as Bluetooth or Wi-Fi, the main difference being the intended operating range: 10s of metres for Wi-Fi; metres for Bluetooth; and mere centimetres for NFC.
NFC has come to be most often used for carrying wireless payments instructions from a mobile phone to a merchant terminal. It's the technology underneath MasterCard PayPass and Visa payWave, in which your phone is loaded with an app and account information to make it behave like a contactless credit card.
The NFC system has a few modes of operation. The one used for PayPass and payWave is "Card Emulation Mode" which is pretty self explanatory. Here an NFC phone appears to a merchant terminal as though it (the phone) is a contactless payment card. As such, the terminal and phone exchange payments messages exactly as if a card was involved; cardholder details and payment amount are confirmed and send on to the merchant's bank for processing. NFC payments has turned out to be a contentious business, with disconcertingly few success stories, and a great deal of push-back from analysts. The jury is still out on whether NFC payments will ever be sustainable.
However, NFC technology has other tricks. Another mode is "Reader Emulation Mode" in which the mobile phone reads from (and writes to) a contactless smartcard. As an identity analyst, I find this by far the more interesting option, and it's the one that IBM researchers have exploited in their new 2FA method.
According to what's been reported at CNET and other news outlets, a mobile and a smartcard are used in what we call a "challenge-response" combo. The basic authentication problem is to get the user to prove who she is, to the app's satisfaction. In the demo, the user is invited to authenticate herself to an app using her smartcard. Under the covers, a random challenge number is generated at a server, passed over the Internet or phone network to the mobile device which in turn sends it over NFC to the smartcard. The card then 'transforms' the challenge code into a response using a key specific to the user, and returns it to the app, which passes it back to the server. The server then verifies that the response corresponds to the challenge, and if it does, we know that the right card and therefore the right user is present.
NOTE:Technically there are a number of ways the challenge can be transformed into a response code capable of being linked back to the original number. The most elegant way is to use asymmetric cryptography, aka digital signatures. The card would use a unique private key to encrypt the challenge into a response; the server subsequently uses a public key to try and decrypt the response. If the decrypted response matches the challenge, then we know the public key matches the private key. A PKI verifies that the expected user controls the given public-private key pair, thus authenticating that user to the card and the app.
Further, I'd suggest the challenge-response can be effected without a server, if a public key certificate binding the user to the key pair is made available to the app. The challenge could be created in the app, sent over NFC to the card, signed by the private key in the card, and returned by NFC to be verified in the app. Local processing in this way is faster and more private involving a central server.
Significance of the demo
The IBM demonstration is a terrific use of the native cryptographic powers now commonplace in smartcards and mobile apps. No hardware modifications are needed to deploy the 2FA solution; all that's required is that a private key specific to the user be loaded into their smartcard at the time the card is personalised. Almost all advanced payments, entitlements and government services cards today can be provisioned in such a manner. So we can envisage a wonderful range of authorization scenarios where existing smartcards would be used by their holders for strong access control. For example:
- Employee ID card (including US Govt PIV-I) presented to an enterprise mobile app, to access and control corporate applications, authorize purchase orders, sign company documents etc
- Government ID card presented to a services app, for G2C functions
- Patient ID or health insurance card presented to a health management app, for patient access to personal health records, prescriptions, claims etc.
- Health Provider ID card presented to a professional app, to launch e-health functions like dispensing, orders, admissions, insurance payments etc,
- Credit Card presented to a payment app, for online shopping.
I can't believe the security industry won't now turn to use smartcards and similar chipped devices for authenticating users to mobile devices for a whole range of applications. We now have a golden opportunity to build identity and authorization security into the mobile platform in its formative stages, avoiding the awful misadventures that continue to plague PCs. Let's not blow it!
The Australian parliament recently revised our definition of Personal Information (or, roughly equivalently, what Americans call Personally Indentifiable Information, or PII). We have lowered the bar somewhat, with a new regime that will categorise even more examples of data as Personal Information (PI). And this has triggered fresh anxieties amongst security practitioners about different interpretations of the law. But I like to think the more liberal definition provides an opportunity for security professionals to actually embrace privacy practice, precisely because, more than ever, privacy management is about uncertainty and risk.
Since 1988, the Australian Privacy Act has defined Personal Information as:
- information or an opinion ... whether true or not, and whether recorded in a material form or not, about an individual whose identity is apparent, or can reasonably be ascertained, from the information or opinion (underline added).
The Privacy Amendment (Enhancing Privacy Protection) Act 2012 says that PI is:
- information or an opinion about an identified individual, or an individual who is reasonably identifiable: (a) whether the information or opinion is true or not; and (b) whether the information or opinion is recorded in a material form or not (underline added).
What matters for the present discussion is that the amendments remove the previous condition that identification of the individual be done from the Personal Information itself. So under the new definition, we are required to consider data as PI if there is a reasonable likelihood that it may be identified in future by any means!
- [Note that much of the discussion that follows applies equally to the US concept of PII. The US General Services Administration defines Personally Identifiable Information as information that can be used to distinguish or trace an individual’s identity, either alone or when combined with other personal or identifying information that is linked or linkable to a specific individual (underline added). This means that items of data can constitute PII if other data can be combined to identify the person concerned. The fragments are regarded as PII even if it is the whole that does the identifying.
- Crucially, the middle I in PII stands for Identifiable, and not, as many people presume, Identifying. PII need not uniquely identify a person, it merely needs to be identifiable with a person. ]
On more than one occasion in the last few weeks, I've been asked -- semi-rhetorically -- if the new Australian definition means that even an IP or MAC address nowadays could count as "Personal Information". And I've said that in short, yes it appears so!
Some security people are uncomfortable with this, but why? When it comes down to it, what is so worrying about having to take care of Personal Information? In Australia and in 100-odd other jurisdictions with OECD based data protection laws, it means that data custodians are required to handle their information assets in accordance with certain sets of Privacy Principles. This is not trivial but neither is it necessarily onerous. If the obligations in the Privacy Principles are examined in a timely manner, alongside compliance, security and information management, then they can be accommodated as just another facet of organisational hygiene.
So for instance, consider a large data base of 'anonymous' records indexed by MAC address. This is just the sort of data that's being collected by retailers with in-store cell phone tracking systems, and used to study how customers move through the facility and interact with stores and merchandise. Strictly speaking, if the records are not identifiable then they are not PII and data protection laws do not apply. But the new definition of Personal Information in Australia means IT designers need to consider the prospect of the records becoming identifiable in the event that another data set comes into play. And why not? If anonymous data becomes identified then the data custodian will suddenly find themselves in scope for privacy laws, so it's prudent to plan for that scenario now. Depending on the custodian's risk appetite, any large potentially identifiable data set should be managed with regard to Privacy Principles. These would dictate that the collection of records should be limited to what's required for clear business purposes; that records collected for one purpose not be casually used for other unrelated purposes; and that the organisation be open about what data it collects and why. These sorts of measures are really pretty sensible.
Security practitioners I've spoken with about PI and identifiability are also upset about the ambiguity in the definition of Personal Information, and that classic bit of qualified legalese: "reasonably" identifiable. They complain that the identifiability of a piece of data is relative and fluid, and they don't want to have to interpret the legal definition. But I'm struck here by an inconsistency, because security management is all about uncertainty.
Yes, identifiability changes over time and in response to organisational developments. But security professionals and ICT architects should treat the future identification of a piece of unnamed data as just another sort of threat. The probability that data becomes identifiable depends on a range of variables that are a lot like other factors (like the emergence of other data, changes of circumstance, or developments in data analysis) that are routinely evaluated during risk assessment.
To deal with identifiability and the classification of data as PI or not PI, you should look at the following:
- consider the broad context of your data assets, how they are used, and how they are linked to other data sets
- think about how your data assets might grow and evolve over time
- look at business pressures and plans to expand the detail and value of data
- make assumptions, and document them, as you do with any business analysis
- and plan to review periodically.
Many organisations maintain a formal Information Assets Inventory and/or an Information Classification regime, and these happen to be ideal management mechanisms in which to classify data as PI or not PI. That decision should be made against the backdrop of the organisation's risk appetite. How conservative or adventurous are in you respect of other risks? If you happen to mis-classify Personal Information, what could be the consequences, and how would the organisation respond? Do some scenario planning, and involve legal, risk and compliance. While you're at it, take the chance to raise awareness outside IT of how information is managed.
Be prepared to review and change your classifications from non-PI to PI over time. Remember that security managers should always be prepared for change. Embrace the uncertainty in Personal Information!
Truly, privacy can be tackled by IT professionals in much the same way as security. There are no certainties in security and it's the same in privacy. We will never have perfect privacy; rather, privacy management is really about putting reasonable arrangements in place for controlling the flow of Personal Information.
So, if something that's anonymous today, might be identified later, you're going have to deal with that eventually. Why not start the planning now, treat identifiability as just another threat, and roll your privacy and security management together?
Update 30 Sept 2013
I've just come across the 2010 iapp essay The changing meaning of 'personal data' by William B. Baker and Anthony Matyjaszewski. I think it's an excellent survey of the issues; it's very valuable for its span across dozens of different international jurisdictions. And it's accessible to non-lawyers.
The essay looks specifically at the question of whether IP addresses can be PII, and highlights a trend in the US towards conceding that IP addresses combined with other data cane identify, and may therefore count as PII:
Privacy regulators in the European Union regard dynamic IP addresses as personal information. Even though dynamic IP addresses change over time, and cannot be directly used to identify an individual, the EU Article 29 Working Party believes that a copyright holder using "reasonable means" can obtain a user's identity from an IP address when pursuing abusers of intellectual property rights. More recently, other European privacy regulators have voiced similar views regarding permanent IP addresses, noting that they can be used to track and, eventually, identify individuals.
This contrasts sharply to the approach taken in the United States under laws such as COPPA where, a decade ago, the FTC considered whether to classify even static IP addresses as personal information but ultimately rejected the idea out of concern that it would unnecessarily increase the scope of the law. In the past few years, however, the FTC has begun to suggest that IP addresses should be considered PII for much the same reasons as their European counterparts. Indeed, in a recent consent decree, the FTC included within the definition of "nonpublic, individually-identifiable information" an “IP address (or other "persistent identifier")." And the HIPAA Privacy Rule treats IP addresses as a form of "protected health information" by listing them as a type of data that must be removed from PHI for deidentification purposes.
Baker & Matyjaszewski stress that "foreseeability [of re-identification] may simply be a function of one's ingenuity". And indeed it is. But I would reiterate that foreseeing of all sorts of potential adverse events is exactly what security professionals do, day in, day out. Nobody really knows what the chances are of a web site being hacked, or of a trusted employee going feral, but risk assessments routinely involve us gauging these eventualities, which we do by making assumptions, writing them down, drawing conclusions, and reviewing them from time to time. Privacy threats are no different, including the uncertainty about whether data may one day be rendered identifiable.
A week and a bit after Apple released the iPhone 5S with its much vaunted "TouchID" biometric, the fingerprint detector has been subverted by the Chaos Computer Club (CCC). So what are we to make of this?
Security is about economics. The CCC attack is not a trivial exercise. It entailed a high resolution photograph, high res printing, and a fair bit of phaffing about with glue and plastics. Plus of course the attacker needs to have taken possession of the victim's phone because one good thing about Apple's biometric implementation is that the match is done on the device. So one question is, Does the effort required to beat the system outweigh the gains to be made by a successful attacker? For a smartphone with a smart user (who takes care not to load up their device with real valuables) the answer is probably no.
But security is also about transparency and verification, and TouchID is the latest example of the biometrics industry falling short of security norms. Apple has released its new "security" feature with no security specs. No stated figures on False Accept Rate, False Reject Rate or Failure to Enroll Rate, and no independent test results. All we have is anecdotes that the False Reject Rate is very very low (in keeping with legendary Apple human factors engineering), and odd claims that a dead finger won't activate the Authentec technology. It's held out to be a security measure but the manufacturer feels no need to predict how well the device will withstand criminal attack.
There is no shortage of people lining up to say the CCC attack is not a practical threat. Which only begs the question, ok, just how "secure" do we want biometrics to be? Crucially, that's actually impossible to answer, because there are still no agreed real life test protocols for any biometric, and no liveness detection standards. Vendors can make any marketing claim they like for a biometric solution without being held to account. Contrast this Wild West situation with the rigor applied to any other branch of security like cryptographic algorithms, key lengths, Trusted Platform Modules, smartcards and Secure Elements.
You can imagine Bart Simpson defending the iPhone 5S fingerprint scanner:
"It won't be spoofed!
I never said it couldn't be spoofed!
It doesn't really matter if it is spoofed!!!"
Demonstrations of biometric failings need to be taken more seriously - not because they surprise hardened security professionals (they don't) but because the demos lay bare the laziness of many biometrics vendors and advocates, and their willful disregard for security professionalism. People really need to be encouraged to think more critically about biometrics. For one thing, they need to understand subtleties like the difference between the One-to-One authentication of the iPhone 5S and One-to-Many authentication of fanciful fingerprint payment propositions like PayTango.
The truth is that consumer biometrics are all about convenience, not security. And that would be ok, if only manufacturers were honest about it.
The cover of Newsweek magazine on 27 July 1970 featured a cartoon couple cowered by computer and communications technology, and the urgent all-caps headline “IS PRIVACY DEAD?”
Four decades on, Newsweek is dead, but we’re still asking the same question.
Every generation or so, our notions of privacy are challenged by a new technology. In the 1880s (when Warren and Brandeis developed the first privacy jurisprudence) it was photography and telegraphy; in the 1970s it was computing and consumer electronics. And now it’s the Internet, a revolution that has virtually everyone connected to everyone else (and soon everything) everywhere, and all of the time. Some of the world’s biggest corporations now operate with just one asset – information – and a vigorous “publicness” movement rallies around the purported liberation of shedding what are said by writers like Jeff Jarvis (in his 2011 book “Public Parts”) to be old fashioned inhibitions. Online Social Networking, e-health, crowd sourcing and new digital economies appear to have shifted some of our societal fundamentals.
However the past decade has seen a dramatic expansion of countries legislating data protection laws, in response to citizens’ insistence that their privacy is as precious as ever. And consumerized cryptography promises absolute secrecy. Privacy has long stood in opposition to the march of invasive technology: it is the classical immovable object met by an irresistible force.
So how robust is privacy? And will the latest technological revolution finally change privacy forever?
Soaking in information
We live in a connected world. Young people today may have grown tired of hearing what a difference the Internet has made, but a crucial question is whether relatively new networking technologies and sheer connectedness are exerting novel stresses to which social structures have yet to adapt. If “knowledge is power” then the availability of information probably makes individuals today more powerful than at any time in history. Search, maps, Wikipedia, Online Social Networks and 3G are taken for granted. Unlimited deep technical knowledge is available in chat rooms; universities are providing a full gamut of free training via Massive Open Online Courses (MOOCs). The Internet empowers many to organise in ways that are unprecedented, for political, social or business ends. Entirely new business models have emerged in the past decade, and there are indications that political models are changing too.
Most mainstream observers still tend to talk about the “digital” economy but many think the time has come to drop the qualifier. Important services and products are, of course, becoming inherently digital and whole business categories such as travel, newspapers, music, photography and video have been massively disrupted. In general, information is the lifeblood of most businesses. There are countless technology-billionaires whose fortunes are have been made in industries that did not exist twenty or thirty years ago. Moreover, some of these businesses only have one asset: information.
Banks and payments systems are getting in on the action, innovating at a hectic pace to keep up with financial services development. There is a bewildering array of new alternative currencies like Linden dollars, Facebook Credits and Bitcoins – all of which can be traded for “real” (reserve bank-backed) money in a number of exchanges of varying reputation. At one time it was possible for Entropia Universe gamers to withdraw dollars at ATMs against their virtual bank balances.
New ways to access finance have arisen, such as peer-to-peer lending and crowd funding. Several so-called direct banks in Australia exist without any branch infrastructure. Financial institutions worldwide are desperate to keep up, launching amongst other things virtual branches and services inside Online Social Networks (OSNs) and even virtual worlds. Banks are of course keen to not have too many sales conducted outside the traditional payments system where they make their fees. Even more strategically, banks want to control not just the money but the way the money flows, because it has dawned on them that information about how people spend might be even more valuable than what they spend.
Privacy in an open world
For many for us, on a personal level, real life is a dynamic blend of online and physical experiences. The distinction between digital relationships and flesh-and-blood ones seems increasingly arbitrary; in fact we probably need new words to describe online and offline interactions more subtly, without implying a dichotomy.
Today’s privacy challenges are about more than digital technology: they really stem from the way the world has opened up. The enthusiasm of many for such openness – especially in Online Social Networking – has been taken by some commentators as a sign of deep changes in privacy attitudes. Facebook's Mark Zuckerberg for instance said in 2010 that “People have really gotten comfortable not only sharing more information and different kinds, but more openly and with more people - and that social norm is just something that has evolved over time”. And yet serious academic investigation of the Internet’s impact on society is (inevitably) still in its infancy. Social norms are constantly evolving but it’s too early to tell to if they have reached a new and more permissive steady state. The views of information magnates in this regard should be discounted given their vested interest in their users' promiscuity.
At some level, privacy is about being closed. And curiously for a fundamental human right, the desire to close off parts of our lives is relatively fresh. Arguably it’s even something of a “first world problem”. Formalised privacy appears to be an urban phenomenon, unknown as such to people in villages when everyone knew everyone – and their business. It was only when large numbers of people congregated in cities that they became concerned with privacy. For then they felt the need to structure the way they related to large numbers of people – family, friends, work mates, merchants, professionals and strangers – in multi-layered relationships. So privacy was borne of the first industrial revolution. It has taken prosperity and active public interest to create the elaborate mechanisms that protect our personal privacy from day to day and which we take for granted today: the postal services, direct dial telephones, telecommunications regulations, individual bedrooms in large houses, cars in which we can escape or a while, and now of course the mobile handset.
Privacy is about respect and control. Simply put, if someone knows me, then they should respect what they know; they should exercise restraint in how they use that knowledge, and be guided by my wishes. Generally, privacy is not about anonymity or secrecy. Of course, if we live life underground then unqualified privacy can be achieved, yet most of us exist in diverse communities where we actually want others to know a great deal about us. We want merchants to know our shipping address and payment details, healthcare providers to know our intimate details, hotels to know our travel plans and so on. Practical privacy means that personal information is not shared arbitrarily, and that individuals retain control over the tracks of their lives.
Big Data: Big Future
Big Data tools are being applied everywhere, from sifting telephone call records to spot crimes in the planning, to DNA and medical research. Every day, retailers use sophisticated data analytics to mine customer data, ostensibly to better uncover true buyer sentiments and continuously improve their offerings. Some department stores are interested in predicting such major life changing events as moving house or falling pregnant, because then they can target whole categories of products to their loyal customers.
Real time Big Data will become embedded in our daily lives, through several synchronous developments. Firstly computing power, storage capacity and high speed Internet connectivity all continue to improve at exponential rates. Secondly, there are more and more “signals” for data miners to choose from. No longer do you have to consciously tell your OSN what you like or what you’re doing, because new augmented reality devices are automatically collecting audio, video and locational data, and trading it around a complex web of digital service providers. And miniaturisation is leading to a whole range of smart appliances, smart cars and even smart clothes with built-in or ubiquitous computing.
The privacy risks are obvious, and yet the benefits are huge. So how should we think about the balance in order to optimise the outcome? Let’s remember that information powers the new digital economy, and the business models of many major new brands like Facebook, Twitter, Four Square and Google incorporate a bargain for Personal Information. We obtain fantastic services from these businesses “for free” but in reality they are enabled by all that information we give out as we search, browse, like, friend, tag, tweet and buy.
The more innovation we see ahead, the more certain it seems that data will be the core asset of cyber enterprises. To retain and even improve our privacy in the unfolding digital world, we must be able to visualise the data flows that we’re engaged in, evaluate what we get in return for our information, and determine a reasonable trade of costs and benefits
Is Privacy Dead? If the same rhetorical question needs to be asked over and over for decades, then it’s likely the answer is no.