In 2002, a couple of Japanese visitors to Australia swapped passports with each other before walking through an automatic biometric border control gate being tested at Sydney airport. The facial recognition algorithm falsely matched each of them to the others' passport photo. These gentlemen were in fact part of an international aviation industry study group and were in the habit of trying to fool biometric systems then being trialed round the world.
When I heard about this successful prank, I quipped that the algorithms were probably written by white people - because we think all Asians look the same. Colleagues thought I was making a typical sick joke, but actually I was half-serious. It did seem to me that the choice of facial features thought to be most distinguishing in a facial recognition model could be culturally biased.
Since that time, border control face recognition has come a long way, and I have not heard of such errors for many years. Until today.
The San Francisco Chronicle of July 21 carries a front page story about the cloud storage services of Google and Flickr mislabeling some black people as gorillas (see updated story, online). It's a quite incredible episode. Google has apologized. Its Chief Architect of social, Yonatan Zunger, seems mortified judging by his tweets as reported, and is investigating.
The newspaper report quotes machine learning experts who suggest programmers with limited experience of diversity may be to blame. That seems plausible to me, although I wonder where exactly the algorithm R&D gets done, and how much control is to be had over the biometric models and their parameters along the path from basic research to application development.
So man has literally made software in his own image.
The public is now being exposed to Self Driving Cars, which are heavily reliant on machine vision, object recognition and artificial intelligence. If this sort of software can't tell people from apes in static photos given lots of processing time, how does it perform in real time, with fleeting images, subject to noise, and with much greater complexity? It's easy to imagine any number of real life scenarios where an autonomous car will have to make a split-second decision between two pretty similar looking objects appearing unexpectedly in its path.
The general expectation is that Self Driving Cars (SDCs) will be tested to exhaustion. And so they should. But if cultural partiality is affecting the work of programmers, it's possible that testers have suffer the same blind spots without knowing it. Maybe the offending photo labeling programs were never verified with black people. So how are the test cases for SDCs being selected? What might happen when an SDC ventures into environments and neighborhoods where its programmers have never been?
Everybody in image processing and artificial intelligence should be humbled by the racist photo labeling. With the world being eaten by software, we need to reflect really deeply on how such design howlers arise. And frankly double check if we're ready to let computer programs behind the wheel.
Identity online is a vexed problem. The majority of Internet fraud today can be related to weaknesses in the way we authenticate people electronically. Internet identity is terribly awkward too. Unfortunately today we still use password techniques dating back to 1960s mainframes that were designed for technicians, by technicians.
Our identity management problems also stem from over-reach. For one thing, the information era heralded new ways to reach and connect with people, with almost no friction. We may have taken too literally the old saw “information wants to be free.” Further, traditional ways of telling who people are, through documents and “old boys networks” creates barriers, which are anathema to new school Internet thinkers.
For the past 10-to-15 years, a heady mix of ambitions has informed identity management theory and practice: improve usability, improve security and improve “trust.” Without ever pausing to unravel the rainbow, the identity and access management industry has created grandiose visions of global “trust frameworks” to underpin a utopia of seamless stranger-to-stranger business and life online.
Well-resourced industry consortia and private-public partnerships have come and gone over the past decade or more. Numerous “trust” start-up businesses have launched and failed. Countless new identity gadgets, cryptographic algorithms and payment schemes have been tried.
And yet the identity problem is still with us. Why is identity online so strangely resistant to these well-meaning efforts to fix it? In particular, why is federated identity so dramatically easier said than done?
Identification is a part of risk management. In business, service providers use identity to manage the risk that they might be dealing with the wrong person. Different transactions carry different risks, and identification standards are varied accordingly. Conversely, if a provider cannot be sure enough who someone is, they now have the tools to withhold or limit their services. For example, when an Internet customer signs in from an unusual location, payment processors can put a cap on the dollar amounts they will authorize.
Across our social and business walks of life, we have distinct ways of knowing people, which yields a rich array of identities by which we know and show who we are to others. These Identities have evolved over time to suit different purposes. Different relationships rest on different particulars, and so identities naturally become specific not general.
The human experience of identity is one of ambiguity and contradictions. Each of us simultaneously holds a weird and wonderful ensemble of personal, family, professional and social identities. Each is different, sometimes radically so. Some of us lead quite secret lives, and I’m not thinking of anything salacious, but maybe just the role-playing games that provide important escapes from the humdrum.
Most of us know how it feels when identities collide. There’s no better example than what I call the High School Reunion Effect: that strange dislocation you feel when you see acquaintances for the first time in decades. You’ve all moved on, you’ve adopted new personae in new contexts – not the least of which is the one defined by a spouse and your own new family. Yet you find yourself re-winding past identities, relating to your past contemporaries as you all once were, because it was those school relationships, now fossilised, that defined you.
Frankly, we’ve made a mess of the pivotal analogue-to-digital conversion of identity. In real life we know identity is malleable and relative, yet online we’ve rendered it crystalline and fragile.
We’ve come close to the necessary conceptual clarity. Some 10 years ago a network of “identerati” led by Kim Cameron of Microsoft composed the “Laws of Identity,” which contained a powerful formulation of the problem to be addressed. The Laws defined Digital Identity as “a set of claims made [about] a digital subject.”
Your Digital Identity is a proxy for a relationship, pointing to a suite of particulars that matter about you in a certain context. When you apply for a bank account, when you subsequently log on to Internet banking, when you log on to your work extranet, or to Amazon or PayPal or Twitter, or if you want to access your electronic health record, the relevant personal details are different each time.
The flip side of identity management is privacy. If authentication concerns what a Relying Party needs to know about you, then privacy is all about what they don’t need to know. Privacy amounts to information minimization; security professionals know this all too well as the “Need to Know” principle.
All attempts at grand global identities to date have failed. The Big Certification Authorities of the 1990s reckoned a single, all-purpose digital certificate would meet the needs of all business, but they were wrong. Ever more sophisticated efforts since then have also failed, such as the Infocard Foundation, Liberty Alliance and the Australian banking sector’s Trust Centre.
Significantly, federation for non-trivial identities only works within regulatory monocultures – for example the US Federal Bridge CA, or the Scandinavian BankID network – where special legislation authorises banks and governments to identify customers by the one credential. The current National Strategy for Trusted Identities in Cyberspace has pondered legislation to manage liability but has balked. The regulatory elephant remains in the room.
As an aside, obviously social identities like Facebook and Twitter handles federate very nicely, but these are issued by organisations that don't really know who we are, and they're used by web sites that don't really care who we are; social identity federation is a poor model for serious identity management.
A promising identity development today is the Open Identity Foundation’s Attribute Exchange Network, a new architecture seeking to organise how identity claims may be traded. The Attribute Exchange Network resonates with a growing realization that, in the words of Andrew Nash, a past identity lead at Google and at PayPal, “attributes are at least as interesting as identities – if not more so.”
If we drop down a level and deal with concrete attribute data instead of abstract identities, we will start to make progress on the practical challenges in authentication: better resistance to fraud and account takeover, easier account origination and better privacy.
My vision is that by 2019 we will have a fresh marketplace of Attribute Providers. The notion of “Identity Provider” should die off, for identity is always in the eye of the Relying Party. What we need online is an array of respected authorities and agents that can vouch for our particulars. Banks can provide reliable electronic proof of our payment card numbers; government agencies can attest to our age and biographical details; and a range of private businesses can stand behind attributes like customer IDs, membership numbers and our retail reputations.
In five years time I expect we will adopt a much more precise language to describe how to deal with people online, and it will reflect more faithfully how we’ve transacted throughout history. As the old Italian proverb goes: It is nice to “trust” but it’s better not to.
This article first appeared as "Abandoning identity in favor of attributes" in Secure ID News, 2 December, 2014.
The Australian government's new Digital Transformation Office (DTO) is a welcome initiative, and builds on a generally strong e-government program of many years standing.
But I'm a little anxious about one plank of the DTO mission: the development of a "Trusted Digital Identity Framework".
We've had several attempts at this sort of thing over many years, and we really need a fresh approach next time around.
I hope we don't re-hash the same old hopes for "trust" and "identity" as we have for over 15 years. The real issues can be expressed more precisely. How do we get reliable signals about the people and entities we're trying to deal with online? How do we equip individuals to be able to show relevant signals about themselves, sufficient to get something done online? What are the roles of governments and markets in all this?
Frankly, I loathe the term "Trust Framework"! Trust is hugely distracting if not irrelevant. And we already have frameworks in spades.
Every year the Constellation SuperNova Awards recognise eight individuals for their leadership in digital business. Nominate yourself or someone you know by August 7, 2015.
The SuperNova Awards honour leaders that demonstrate excellence in the application and adoption of new and emerging technologies. In its fifth year, the SuperNova Awards program will recognise eight individuals who demonstrate true leadership in digital business through their application of new and emerging technologies. Constellation Research is searching for leaders and corporate teams who have innovatively applied disruptive technolgies to their businesses, to adapt to the rapidly-changing digital business environment. Special emphasis will be given to projects that seek to redefine how the enterprise uses technology on a large scale.
We’re searching for the boldest, most transformative technology projects out there. Apply for a SuperNova Award by filling out the application here: http://www.constellationr.com/node/3137/apply
SuperNova Award Categories
- Consumerization of IT & The New C-Suite - The Enterprise embraces consumer tech, and perfects it.
- Data to Decisions - Using data to make informed business decisions.
- Digital Marketing Transformation - Put away that megaphone. Marketing in the digital age requires a new approach.
- Future of Work - The processes and technologies addressing the rapidly shifting work paradigm.
- Matrix Commerce - Commerce responds to changing realities from the supply chain to the storefront.
- Next Generation Customer Experience - Customers in the digital age demand seamless service throughout all lifecycle stages and across all channels.
- Safety and Privacy - Not 'security'. Safety and Privacy is the art and science of the art and science of protecting information assets, including your most important assets: your people.
- Technology Optimization & Innovation - Innovative methods to balance innovation and budget requirements.
Five reasons to apply for a SuperNova Award
- Exposure to the SuperNova Award judges, comprised of the top influencers in enterprise technology
- Case study highlighting the achievements of the winners written by Constellation analysts
- Complimentary admission to the SuperNova Award Gala Dinner and Constellation's Connected Enterprise for all finalists November 4-6, 2015 (NB: lodging and travel not included)
- One year unlimited access to Constellation's research library
- Winners featured on Constellation's blog and weekly newsletter.
Bank robber Willie Sutton, when asked why he robbed banks, answered "That's where the money is". It's the same with breaches. Large databases are the targets of people who want data. It's that simple.
Having said that, there are different sorts of breaches and corresponding causes. Most high profile breaches are obviously driven by financial crime, where attackers typically grab payment card details. Breaches are what powers most card fraud. Organised crime gangs don't pilfer card numbers one at a time from people's computers or insecure websites (and so the standard advice to consumers to change their passwords every month and to make sure they see a browser padlock is nice but don't think it will do anything to stop mass card fraud).
Instead of blaming end user failings, we need to really turn up the heat on enterprise IT. The personal data held by big merchant organisations (including even mundane operations like car parking chains) is now worth many hundreds of millions of dollars. If this kind of value was in the form of cash or gold, you'd see Fort Knox-style security around it. Literally. But how much money does even the biggest enterprise invest in security? And what do they get for their money?
The grim reality is that no amount of conventional IT security today can prevent attacks on assets worth billions of dollars. The simple economics is against us. It's really more a matter of luck than good planning that some large organisations have yet to be breached (and that's only so far as we know).
Organised crime is truly organised. If it's card details they want, they go after the big data stores, at payments processors and large retailers. The sophistication of these attacks is amazing even to security pros. The attack on Target's Point of Sale terminals for instance was in the "can't happen" category.
The other types of criminal breach include mischief, as when the iCloud photos of celebrities were leaked last year, hacktivism, and political or cyber terrorist attacks, like the one on Sony.
There's some evidence that identity thieves are turning now to health data to power more complex forms of crime. Instead of stealing and replaying card numbers, identity thieves can use deeper, broader information like patient records to either commit fraud against health system payers, or to open bogus accounts and build them up into complex scams. The recent Anthem database breach involved extensive personal records on 80 million individuals; we have yet to see how these details will surface in the identity black markets.
The ready availability of stolen personal data is one factor we find to be driving Identity and Access Management (IDAM) innovation; see "The State of Identity Management in 2015". Next generation IDAM will eventually make stolen data less valuable, but for the foreseeable future, all enterprises holding large customer datasets we will remain prime targets for identity thieves.
Now let's not forget simple accidents. The Australian government for example has had some clangers though these can happen to any big organisation. A few months ago a staffer accidentally attached the wrong a file to an email, and thus released the passport details of the G20 leaders. Before that, we saw a spreadsheet holding personal details of thousands of asylum seekers get mistakenly pasted into a government website HTML.
A lesson I want to bring out here is the terrible complexity and fragility of our IT systems. It doesn't take much for human error to have catastrophic results. Who among us has not accidentally hit 'Reply All' or attached the wrong file to an email? If you did an honest Threat & Risk Assessment on these sorts of everyday office systems, you'd have to conclude they are not safe to handle sensitive data nor to be operated by most human beings. But of course we simply can't afford notto use office IT. We've created a monster.
Again, criminal elements know this. The expert cryptographer Bruce Schneier once said "amateurs hack systems, professionals hack people". Access control on today's sprawling complex computer systems is generally poor, leaving the way open for inside jobs. Just look at the Chelsea Manning case, one of the worst breaches of all time, made possible by granting too high access privileges to too many staffers.
Outside government, access control is worse, and so is access logging - so system administrators often can't tell there's even been a breach until circumstantial evidence emerges. I am sure the majority of breaches are occurring without anyone knowing. It's simply inevitable.
Look at hotels. There are occasional reports of hotel IT breaches, but they are surely happening continuously. The guest details held in hotels is staggering - payment card details, license plates, travel itineraries including airline flight details, even passport numbers are held by some places. And these days, with global hotel chains, the whole booking database is available to a rogue employee from any place in the world, 24-7.
Please, don't anyone talk to me about PCI-DSS! The Payment Card Industry Data Security Standards for protecting cardholder details haven't had much effect at all. Some of the biggest breaches of all time have affected top tier merchants and payments processors which appear to have been PCI compliant. Yet the lawyers for the payments institutions will always argue that such-and-such a company wasn't "really" compliant. And the PCI auditors walk away from any liability for what happens in between audits. You can understand their position; they don't want to be accountable for wrong doings or errors committed behind their backs. However, cardholders and merchants are caught in the middle. If a big department store passes its PCI audits, surely we can expect them to be reasonably secure year-long? No, it turns out that the day after a successful audit, an IT intern can mis-configure a firewall or forget a patch; all those defences become useless, and the audit is rendered meaningless.
Which reinforces my point about the fragility of IT: it's impossible to make lasting security promises anymore.
In any case, PCI is really just a set of data handling policies and promises. They improve IT security hygiene, and ward off amateur attacks. But they are useless against organised crime or inside jobs.
There is an increasingly good argument to outsource data management. Rather than maintain brittle databases in the face of so much risk, companies are instead turning to large reputable cloud services, where the providers have the scale, resources and attention to detail to protect data in their custody. I previously looked at what matters in choosing cloud services from a geographical perspective in my Constellation Research report "Why Cloud Geography Matters in a Post-Snowden/NSA Era". And in forthcoming research I'll examine a broader set of contract-related KPIs to help buyers make the right choice of cloud service provider.
If you asked me what to do about data breaches, I'd say the short-to-medium term solution is to get with the strength and look for managed security services from specialist providers. In the longer term, we will have to see grassroots re-engineering of our networks and platforms, to harden them against penetration, and to lessen the opportunity for identity theft.
In the meantime, you can hope for the best, if you plan for the worst.
Actually, no, you can't hope.
The Australian government is to revamp the troubled Personally Controlled Electronic Health Record (PCEHR). In line with the Royle Review from Dec 2013, it is reported that patient participation is to change from the current Opt-In model to Opt-Out; see "Govt to make e-health records opt-out" by Paris Cowan, IT News.
That is to say, patient data from hospitals, general practice, pathology and pharmacy will be added by default to a central longitudinal health record, unless patients take steps (yet to be specified) to disable sharing.
The main reason for switching the consent model is simply to increase the take-up rate. But it's a much bigger change than many seem to realise.
The government is asking the community to trust it to hold essentially all medical records. Are the PCEHR's security and privacy safeguards up to scratch to take on this grave responsibility? I argue the answer is no, on two grounds.
Firstly there is the practical matter of PCEHR's security performance to date. It's not good, based on publicly available information. On multiple occasions, prescription details have been uploaded from community pharmacy to the wrong patient's records. There have been a few excuses made for this error, with blame sheeted home to the pharmacy. But from a system's perspective -- and health care is all about the systems -- you cannot pass the buck like that. Pharmacists are using a PCEHR system that was purportedly designed for them. And it was subject to system-wide threat & risk assessments that informed the architecture and design of not just the electronic records system but also the patient and healthcare provider identification modules. How can it be that the PCEHR allows such basic errors to occur?
Secondly and really fundamentally, you simply cannot invert the consent model as if it's a switch in the software. The privacy approach is deep in the DNA of the system. Not only must PCEHR security be demonstrably better than experience suggests, but it must be properly built in, not retrofitted.
Let me explain how the consent approach crops up deep in the architecture of something like PCEHR. During analysis and design, threat & risk assessments (TRAs) and privacy impact assessments (PIAs) are undertaken, to identify things that can go wrong, and to specify security and privacy controls. These controls generally comprise a mix of technology, policy and process mechanisms. For example, if there is a risk of patient data being sent to the wrong person or system, that risk can be mitigated a number of ways, including authentication, user interface design, encryption, contracts (that obligate receivers to act responsibly), and provider and patient information. The latter are important because, as we all should know, there is no such thing as perfect security. Mistakes are bound to happen.
One of the most fundamental privacy controls is participation. Individuals usually have the ultimate option of staying away from an information system if they (or their advocates) are not satisfied with the security and privacy arrangements. Now, these are complex matters to evaluate, and it's always best to assume that patients do not in fact have a complete understanding of the intricacies, the pros and cons, and the net risks. People need time and resources to come to grips with e-health records, so a default opt-in affords them that breathing space. And it errs on the side of caution, by requiring a conscious decision to participate. In stark contrast, a default opt-out policy embodies a position that the scheme operator believes it knows best, and is prepared to make the decision to participate on behalf of all individuals.
Such a position strikes many as beyond the pale, just on principle. But if opt-out is the adopted policy position, then clearly it has to be based on a risk assessment where the pros indisputably out-weigh the cons. And this is where making a late switch to opt-out is unconscionable.
You see, in an opt-in system, during analysis and design, whenever a risk is identified that cannot be managed down to negligible levels by way of technology and process, the ultimate safety net is that people don't need to use the PCEHR. It is a formal risk management ploy (a part of the risk manager's toolkit) to sometimes fall back on the opt-in policy. In an opt-in system, patients sign an agreement in which they accept some risk. And the whole security design is predicated on that.
Look at the most recent PIA done on the PCEHR in 2011; section 9.1.6 "Proposed solutions - legislation" makes it clear that opt-in participation is core to the existing architecture. The PIA makes a "critical legislative recommendation" including:
- a number of measures to confirm and support the 'opt in' nature of the PCEHR for consumers (Recommendations 4.1 to 4.3) [and] preventing any extension of the scope of the system, or any change to the 'opt in' nature of the PCEHR.
The PIA at section 2.2 also stresses that a "key design feature of the PCEHR System ... is opt in – if a consumer or healthcare provider wants to participate, they need to register with the system." And that the PCEHR is "not compulsory – both consumers and healthcare providers choose whether or not to participate".
A PDF copy of the PIA report, which was publicly available at the Dept of Health website for a few years after 2011, is archived here.
The fact is that if the government changes the PCEHR from opt-in to opt-out, it will invalidate the security and privacy assessments done to date. The PIAs and TRAs will have to be repeated, and the project must be prepared for major redesign.
The Royle Review report (PDF) did in fact recommend "a technical assessment and change management plan for an opt-out model ..." (Recommendation 14) but I am not aware that such a review has taken place.
To look at the seriousness of this another way, think about "Privacy by Design", the philosophy that's being steadily adopted across government. In 2014 NEHTA wrote in a submission (PDF) to the Australian Privacy Commissioner:
- The principle that entities should employ “privacy by design” by building privacy into their processes, systems, products and initiatives at the design stage is strongly supported by NEHTA. The early consideration of privacy in any endeavour ensures that the end product is not only compliant but meets the expectations of stakeholders.
One of the tenets of Privacy by Design is that you cannot bolt on privacy after a design is done. Privacy must be designed into the fabric of any system from the outset. All the way along, PCEHR has assumed opt-in, and the last PIA enshrined that position.
If the government was to ignore its own Privacy by Design credo, and not revisit the PCEHR architecture, it would be an amazing breach of the public's trust in the healthcare system.
Every now and then, a large organisation in the media spotlight will experience the special pain of having a password accidentally revealed in the background of a photograph or TV spot. Security commentator Graham Cluley has recorded a lot of these misadventures, most recently at a British national rail control room, and before that, in the Superbowl nerve centre and an emergency response agency.
Security folks love their schadenfreude but what are we to make of these SNAFUs? Of course, nobody is perfect. And some plumbers have leaky taps.
But these cases hold much deeper lessons. These are often critical infrastructure providers (consider that on financial grounds, there may be more at stake in Superbowl operations than the railways). The outfits making kindergarten security mistakes will have been audited many times over. So how on earth do they pass?
Posting passwords on the wall is not a random error - it's systemic. Some administrators do it out of habit, or desperation. They know it's wrong, but they do it anyway, and they do it with such regularity it gets caught on TV.
I really want to know if none of the security auditors at any of these organisations ever noticed the passwords in plain view? Or do the personnel do a quick clean up on the morning of each audit, only to revert to reality in between audits? Either way, here's yet more proof that security audit, frankly, is a sick joke. And that security practices aren't worth the paper they're printed on.
Security orthodoxy holds that people and process are more fundamental than technology, and that people are the weakest link. That's why we have security management processes and security audits. It's why whole industries have been built around security process standards like ISO 27000. So it's unfathomable to me that companies with passwords caught on camera can have have ever passed their audits.
Security isn't what people think it is. Instead of meticulous procedures and hawk-eyed inspections, too often it's just simple people going through the motions. Security isn't intellectually secure. The things we do in the name of "security" don't make us secure.
Let's not dismiss password flashing as a temporary embarrassment for some poor unfortunates. This should be humiliating for the whole information security industry. We need another way.
Picture credits: Graham Cluley.
Posted in Security
Ray Wang tells us now that writing a book and launching a company are incredibly fulfilling things to do - but ideally, not at the same time. He thought it would take a year to write "Disrupting Digital Business", but since it overlapped with building Constellation Research, it took three! But at the same time, his book is all the richer for that experience.
Ray is on a world-wide book tour (tweeting under the hash tag #cxotour). I was thrilled to participate in the Melbourne leg last week. We convened a dinner at Melbourne restaurant The Deck and were joined by a good cross section of Australian private and public sector businesses. There were current and recent executives from Energy Australia, Rio Tinto, the Victorian Government and Australia Post among others, plus the founders of several exciting local start-ups. And we were lucky to have special guests Brian Katz and Ben Robbins - two renowned mobility gurus.
The format for all the launch events has one or two topical short speeches from Constellation analysts and Associates, and a fireside chat by Ray. In Melbourne, we were joined by two of Australia's deep digital economy experts, Gavin Heaton and Joanne Jacobs. Gavin got us going on the night, surveying the importance of innovation, and the double edged opportunities and threats of digital disruption.
Then Ray spoke off-the-cuff about his book, summarising years of technology research and analysis, and the a great many cases of business disruption, old and new. Ray has an encyclopedic grasp of tech-driven successes and failures going back decades, yet his presentations are always up-to-the-minute and full of practical can-do calls to action. He's hugely engaging, and having him on a small stage for a change lets him have a real conversation with the audience.
Speaking with no notes and PowerPoint-free, Ray ranged across all sorts of disruptions in all sorts of sectors, including:
- Sony's double cassette Walkman (which Ray argues playfully was their "last innovation")
- Coca Cola going digital, and the speculative "ten cent sip"
- the real lesson of the iPhone: geeks spend time arguing about whether Apple's technology is original or appropriated, when the point is their phone disrupted 20 or more other business models
- the contrasting Boeing 787 Dreamliner and Airbus A380 mega jumbo - radically different ways to maximise the one thing that matters to airlines: dollars per passenger-miles, and
- Uber, which observers don't always fully comprehend as a rich mix of mobility, cloud and Big Data.
And I closed the scheduled part of the evening with a provocation on privacy. I asked the group to think about what it means to call any online business practice "creepy". Have community norms and standards really changed in the move online? What's worse: government surveillance for political ends, or private sector surveillance for profit? If we pay for free online services with our personal information, do regular consumers understand the bargain? And if cynics have been asking "Is Privacy Dead?" for over 100 years, doesn't it mean the question is purely rhetorical? Who amongst us truly wants privacy to be over?!
The discussion quickly attained a life of its own - muscular, but civilized. And it provided ample proof that whatever you think about privacy, it is complicated and surprising, and definitely disruptive! (For people who want to dig further into the paradoxes of modern digital privacy, Ray and I recently recorded a nice long chat about it).
Here are some of the Digital Disruption tour dates coming up:
Acknowledgement: Daniel Barth-Jones kindly engaged with me after this blog was initially published, and pointed out several significant factual errors, for which I am grateful.
In 2014, the New York Taxi & Limousine Company (TLC) released a large "anonymised" dataset containing 173 million taxi rides taken in 2013. Soon after, software developer Vijay Pandurangan managed to undo the hashed taxi registration numbers. Subsequently, privacy researcher Anthony Tockar went on to combine public photos of celebrities getting in or out of cabs, to recreate their trips. See Anna Johnston's analysis here.
This re-identification demonstration has been used by some to bolster a general claim that anonymity online is increasingly impossible.
On the other hand, medical research advocates like Columbia University epidemiologist Daniel Barth-Jones argue that the practice of de-identification can be robust and should not be dismissed as impractical on the basis of demonstrations such as this. The identifiability of celebrities in these sorts of datasets is a statistical anomaly reasons Barth-Jones and should not be used to frighten regular people out of participating in medical research on anonymised data. He wrote in a blog that:
- "However, it would hopefully be clear that examining a miniscule proportion of cases from a population of 173 million rides couldn’t possibly form any meaningful basis of evidence for broad assertions about the risks that taxi-riders might face from such a data release (at least with the taxi medallion/license data removed as will now be the practice for FOIL request data)."
As a health researcher, Barth-Jones is understandably worried that re-identification of small proportions of special cases is being used to exaggerate the risks to ordinary people. He says that the HIPAA de-identification protocols if properly applied leave no significant risk of re-id. But even if that's the case, HIPAA processes are not applied to data across the board. The TLC data was described as "de-identified" and the fact that any people at all (even stand-out celebrities) could be re-identified from data does create a broad basis for concern - "de-identified" is not what it seems. Barth-Jones stresses that in the TLC case, the de-identification was fatally flawed [technically: it's no use hashing data like registration numbers with limited value ranges because the hashed values can be reversed by brute force] but my point is this: who among us who can tell the difference between poorly de-identified and "properly" de-identified?
And how long can "properly de-identified" last? What does it mean to say casually that only a "minuscule proportion" of data can be re-identified? In this case, the re-identification of celebrities was helped by the fact lots of photos of them are readily available on social media, yet there are so many photos in the public domain now, regular people are going to get easier to be identified.
But my purpose here is not to play what-if games, and I know Daniel advocates statistically rigorous measures of identifiability. We agree on that -- in fact, over the years, we have agreed on most things. The point I am trying to make in this blog post is that, just as nobody should exaggerate the risk of re-identification, nor should anyone play it down. Claims of de-identification are made almost daily for really crucial datasets, like compulsorily retained metadata, public health data, biometric templates, social media activity used for advertising, and web searches. Some of these claims are made with statistical rigor, using formal standards like the HIPAA protocols; but other times the claim is casual, made with no qualification, with the aim of comforting end users.
"De-identified" is a helluva promise to make, with far-reaching ramifications. Daniel says de-identification researchers use the term with caution, knowing there are technical qualifications around the finite probability of individuals remaining identifiable. But my position is that the fine print doesn't translate to the general public who only hear that a database is "anonymous". So I am afraid the term "de-identified" is meaningless outside academia, and in casual use is misleading.
Barth-Jones objects to the conclusion that "it's virtually impossible to anonymise large data sets" but in an absolute sense, that claim is surely true. If any proportion of people in a dataset may be identified, then that data set is plainly not "anonymous". Moreover, as statistics and mathematical techniques (like facial recognition) improve, and as more ancillary datasets (like social media photos) become accessible, the proportion of individuals who may be re-identified will keep going up.[Readers who wish to pursue these matters further should look at the recent Harvard Law School online symposium on "Re-identification Demonstrations", hosted by Michelle Meyer, in which Daniel Barth-Jones and I participated, among many others.]
Both sides of this vexed debate need more nuance. Privacy advocates have no wish to quell medical research per se, nor do they call for absolute privacy guarantees, but we do seek full disclosure of the risks, so that the cost-benefit equation is understood by all. One of the obvious lessons in all this is that "anonymous" or "de-identified" on their own are poor descriptions. We need tools that meaningfully describe the probability of re-identification. If statisticians and medical researchers take "de-identified" to mean "there is an acceptably small probability, namely X percent, of identification" then let's have that fine print. Absent the detail, lay people can be forgiven for thinking re-identification isn't going to happen. Period.
And we need policy and regulatory mechanisms to curb inappropriate re-identification. Anonymity is a brittle, essentially temporary, and inadequate privacy tool.
I argue that the act of re-identification ought to be treated as an act of Algorithmic Collection of PII, and regulated as just another type of collection, albeit an indirect one. If a statistical process results in a person's name being added to a hitherto anonymous record in a database, it is as if the data custodian went to a third party and asked them "do you know the name of the person this record is about?". The fact that the data custodian was clever enough to avoid having to ask anyone about the identity of people in the re-identified dataset does not alter the privacy responsibilities arising. If the effect of an action is to convert anonymous data into personally identifiable information (PII), then that action collects PII. And in most places around the world, any collection of PII automatically falls under privacy regulations.
It looks like we will never guarantee anonymity, but the good news is that for privacy, we don't actually need to. Privacy is the protection you need when you affairs are not anonymous, for privacy is a regulated state where organisations that have knowledge about you are restrained in what they do with it. Equally, the ability to de-anonymise should be restricted in accordance with orthodox privacy regulations. If a party chooses to re-identify people in an ostensibly de-identified dataset, without a good reason and without consent, then that party may be in breach of data privacy laws, just as they would be if they collected the same PII by conventional means like questionnaires or surveillance.
Surely we can all agree that re-identification demonstrations serve to shine a light on the comforting claims made by governments for instance that certain citizen datasets can be anonymised. In Australia, the government is now implementing telecommunications metadata retention laws, in the interests of national security; the metadata we are told is de-identified and "secure". In the UK, the National Health Service plans to make de-identified patient data available to researchers. Whatever the merits of data mining in diverse fields like law enforcement and medical research, my point is that any government's claims of anonymisation must be treated critically (if not skeptically), and subjected to strenuous and ongoing privacy impact assessment.
Privacy, like security, can never be perfect. Privacy advocates must avoid giving the impression that they seek unrealistic guarantees of anonymity. There must be more to privacy than identity obscuration (to use a more technically correct term than "de-identification"). Medical research should proceed on the basis of reasonable risks being taken in return for beneficial outcomes, with strong sanctions against abuses including unwarranted re-identification. And then there wouldn't need to be a moral panic over re-identification if and when it does occur, because anonymity, while highly desirable, is not essential for privacy in any case.
The State Of Identity Management in 2015
Constellation Research recently launched the "State of Enterprise Technology" series of research reports. These assess the current enterprise innovations which Constellation considers most crucial to digital transformation, and provide snapshots of the future usage and evolution of these technologies.
My second contribution to the state-of-the-state series is "Identity Management Moves from Who to What". Here's an excerpt from the report:
In spite of all the fuss, personal identity is not usually important in routine business. Most transactions are authorized according to someone’s credentials, membership, role or other properties, rather than their personal details. Organizations actually deal with many people in a largely impersonal way. People don’t often care who someone really is before conducting business with them. So in digital Identity Management (IdM), one should care less about who a party is than what they are, with respect to attributes that matter in the context we’re in. This shift in focus is coming to dominate the identity landscape, for it simplifies a traditionally multi-disciplined problem set. Historically, the identity management community has made too much of identity!
Six Digital Identity Trends for 2015
1. Mobile becomes the center of gravity for identity. The mobile device brings convergence for a decade of progress in IdM. For two-factor authentication, the cell phone is its own second factor, protected against unauthorized use by PIN or biometric. Hardly anyone ever goes anywhere without their mobile - service providers can increasingly count on that without disenfranchising many customers. Best of all, the mobile device itself joins authentication to the app, intimately and seamlessly, in the transaction context of the moment. And today’s phones have powerful embedded cryptographic processors and key stores for accurate mutual authentication, and mobile digital wallets, as Apple’s Tim Cook highlighted at the recent White House Cyber Security Summit.
2. Hardware is the key – and holds the keys – to identity. Despite the lure of the cloud, hardware has re-emerged as pivotal in IdM. All really serious security and authentication takes place in secure dedicated hardware, such as SIM cards, ATMs, EMV cards, and the new Trusted Execution Environment mobile devices. Today’s leading authentication initiatives, like the FIDO Alliance, are intimately connected to standard cryptographic modules now embedded in most mobile devices. Hardware-based identity management has arrived just in the nick of time, on the eve of the Internet of Things.
3. The “Attributes Push” will shift how we think about identity. In the words of Andrew Nash, CEO of Confyrm Inc. (and previously the identity leader at PayPal and Google), “Attributes are at least as interesting as identities, if not more so.” Attributes are to identity as genes are to organisms – they are really what matters about you when you’re trying to access a service. By fractionating identity into attributes and focusing on what we really need to reveal about users, we can enhance privacy while automating more and more of our everyday transactions.
The Attributes Push may recast social logon. Until now, Facebook and Google have been widely tipped to become “Identity Providers”, but even these giants have found federated identity easier said than done. A dark horse in the identity stakes – LinkedIn – may take the lead with its superior holdings in verified business attributes.
4. The identity agenda is narrowing. For 20 years, brands and organizations have obsessed about who someone is online. And even before we’ve solved the basics, we over-reached. We've seen entrepreneurs trying to monetize identity, and identity engineers trying to convince conservative institutions like banks that “Identity Provider” is a compelling new role in the digital ecosystem. Now at last, the IdM industry agenda is narrowing toward more achievable and more important goals - precise authentication instead of general identification.
5. A digital identity stack is emerging. The FIDO Alliance and others face a challenge in shifting and improving the words people use in this space. Words, of course, matter, as do visualizations. IdM has suffered for too long under loose and misleading metaphors. One of the most powerful abstractions in IT was the OSI networking stack. A comparable sort of stack may be emerging in IdM.
6. Continuity will shape the identity experience. Continuity will make or break the user experience as the lines blur between real world and virtual, and between the Internet of Computers and the Internet of Things. But at the same time, we need to preserve clear boundaries between our digital personae, or else privacy catastrophes await. “Continuous” (also referred to as “Ambient”) Authentication is a hot new research area, striving to provide more useful and flexible signals about the instantaneous state of a user at any time. There is an explosion in devices now that can be tapped for Continuous Authentication signals, and by the same token, rich new apps in health, lifestyle and social domains, running on those very devices, that need seamless identity management.
A snapshot at my report "Identity Moves from Who to What" is available for download at Constellation Research. It expands on the points above, and sets out recommendations for enterprises to adopt the latest identity management thinking.