World Wide Web inventor Sir Tim Berners-Lee has given a speech in London, re-affirming the importance of privacy, but unfortunately he has muddied the waters by casting aspersions on privacy law. Berners-Lee makes a technologist's error, calling for unworkable new privacy mechanisms where none in fact are warranted.
The Telegraph reports Berners-Lee as saying "Some people say privacy is dead – get over it. I don't agree with that. The idea that privacy is dead is hopeless and sad." He highlighted that peoples' participation in potentially beneficial programs like e-health is hampered by a lack of trust, and a sense that spying online is constant.
Of course he's right about that. Yet he seems to underestimate the data privacy protections we already have. Instead he envisions "a world in which I have control of my data. I can sell it to you and we can negotiate a price, but more importantly I will have legal ownership of all the data about me" he said according to The Telegraph.
It's a classic case of being careful what you ask for, in case you get it. What would control over "all data about you" look like? Most of the data about us these days - most of the personal data, aka Personally Identifiable Information (PII) - is collected or created behind our backs, by increasingly sophisticated algorithms. Now, people certainly don't know enough about these processes in general, and in too few cases are they given a proper opportunity to opt in to Big Data processes. Better notice and consent mechanisms are needed for sure, but I don't see that ownership could fix a privacy problem.
What could "ownership" of data even mean? If personal information has been gathered by a business process, or created by clever proprietary algorithms, we get into obvious debates over intellectual property. Look at medical records: in Australia and I suspect elsewhere, it is understood that doctors legally own the medical records about a patient, but that patients have rights to access the contents. The interpretation of medical tests is regarded as the intellectual property of the healthcare professional.
The philosophical and legal quandries are many. With data that is only potentially identifiable, at what point would ownership flip from the data's creator to the individual to whom it applies? What if data applies to more than one person, as in household electricity records, or, more seriously, DNA?
What really matters is preventing the exploitation of people through data about them. Privacy (or, strictly speaking, data protection) is fundamentally about restraint. When an organisation knows you, they should be restrained in what they can do with that knowledge, and not use it against your interests. And thus, in over 100 countries, we see legislated privacy principles which require that organisations only collect the PII they really need for stated purposes, that PII collected for one reason not be re-purposed for others, that people are made reasonably aware of what's going on with their PII, and so on.
Berners-Lee alluded to the privacy threats of Big Data, and he's absolutely right. But I point out that existing privacy law can substantially deal with Big Data. It's not necessary to make new and novel laws about data ownership. When an algorithm works out something about you, such as your risk of developing diabetes, without you having to fill out a questionnaire, then that process has collected PII, albeit indirectly. Technology-neutral privacy laws don't care about the method of collection or creation of PII. Synthetic personal data, collected as it were algorithmically, is treated by the law in the same way as data gathered overtly. An example of this principle is found in the successful European legal action against Facebook for automatic tag suggestions, in which biometric facial recognition algorithms identify people in photos without consent.
Technologists often under-estimate the powers of existing broadly framed privacy laws, doubtless because technology neutrality is not their regular stance. It is perhaps surprising, yet gratifying, that conventional privacy laws treat new technologies like Big Data and the Internet of Things as merely potential new sources of personal information. If brand new algorithms give businesses the power to read the minds of shoppers or social network users, then those businesses are limited in law as to what they can do with that information, just as if they had collected it in person. Which is surely what regular people expect.
For many years, American businesses have enjoyed a bit of special treatment under European data privacy laws. The so-called "Safe Harbor" arrangement was negotiated by the Federal Communications Commission (FCC) so that companies could self-declare broad compliance with data security rules. Normally organisations are not permitted to move Personally Identifiable Information (PII) about Europeans beyond the EU unless the destination has equivalent privacy measures in place. The "Safe Harbor" arrangement was a shortcut around full compliance; as such it was widely derided by privacy advocates outside the USA, and for some years had been questioned by the more activist regulators in Europe. And so it seemed inevitable that the arrangement would be eventually annulled, as it was last October.
With the threat of most personal data flows from Europe into America being halted, US and EU trade officials have worked overtime for five months to strike a new deal. Today (January 29) the US Department of Commerce announced the "EU-US Privacy Shield".
The Privacy Shield is good news for commerce of course. But I hope that in the excitement, American businesses don't lose sight of the broader sweep of privacy law. Even better would be to look beyond compliance, and take the opportunity to rethink privacy, because there is more to it than security and regulatory short cuts.
The Privacy Shield and the earlier Safe Harbor arrangement are really only about satisfying one corner of European data protection laws, namely transborder flows. The transborder data flow rules basically say you must not move personal data from an EU state into a jurisdiction where the privacy protections are weaker than in Europe. Many countries actually have the same sort of laws, including Australia. Normally, as a business, you would have to demonstrate to a European data protection authority (DPA) that your information handling is complying with EU laws, either by situating your data centre in a similar jurisdiction, or by implementing legally binding measures for safeguarding data to EU standards. This is why so many cloud service providers are now building fresh infrastructure in the EU.
But there is more to privacy than security and data centre location. American businesses must not think that just because there is a new get-out-of-jail clause for transborder flows, their privacy obligations are met. Much more important than raw data security are the bedrocks of privacy: Collection Limitation, Usage Limitation, and Transparency.
Basic data privacy laws the world-over require organisations to exercise constraint and openness. That is, Personal Information must not be collected without a real demonstrated need (or without consent); once collected for a primary purpose, Personal Information should not be used for unrelated secondary purposes; and individuals must be given reasonable notice of what personal data is being collected about them, how it is collected, and why. It's worth repeating: general data protection is not unique to Europe; at last count, over 100 countries around the world had passed similar laws; see Prof Graham Greenleaf's Global Tables of Data Privacy Laws and Bills, January 2015.
Over and above Safe Harbor, American businesses have suffered some major privacy missteps. The Privacy Shield isn't going to make overall privacy better by magic.
For instance, Google in 2010 was caught over-collecting personal information through its StreetView cars. It is widely known (and perfectly acceptable) that mapping companies use the positions of unique WiFi routers for their geolocation databases. Google continuously collects WiFi IDs and coordinates via its StreetView cars. The privacy problem here was that some of the StreetView cars were also collecting unencrypted WiFi traffic (for "research purposes") whenever they came across it. In over a dozen countries around the world, Google admitted they had breached local privacy laws by colelcting excessive PII, apologised for the overreach, explained it as inadvertent, and deleted all the WiFi records in question. The matter was settled in just a few months in places like Korea, Japan and Australia. But in the US, where there is no general collection limitation privacy rule, Google has been defending what they did. Absent general data privacy protection, the strongest legislation that seems to apply to the StreetView case is wire tap law, but its application to the Internet is complex. And so the legal action has taken years and years, and it's still not resolved.
I don't know why Google doesn't see that a privacy breach in the rest of the world is a privacy breach in the US, and instead of fighting it, concede that the collection of WiFi traffic was unnecessary and wrong.
Other proof that European privacy law is deeper and broader than the Privacy Shield is found in social networking mishaps. Over the years, many of Facebook's business practices for instance have been found unlawful in the EU. Recently there was the final ruling against "Find Friends", which uploads the contact details of third parties without their consent. Before that there was the long running dispute over biometric photo tagging. When Facebook generates tag suggestions, what they're doing is running facial recognition algorithms over photos in their vast store of albums, without the consent of the people in those photos. Identifying otherwise anonymous people, without consent (and without restraint as to what might be done next with that new PII), seems to be an unlawful under the Collection Limitation and Usage Limitation principles.
In 2012, Facebook was required to shut down their photo tagging in Europe. They have been trying to re-introduce it ever since. Whether they are successful or not will have nothing to do with the "Privacy Shield".
The Privacy Shield comes into a troubled trans-Atlantic privacy environment. Whether or not the new EU-US arrangement fares better than the Safe Harbor remains to be seen. But in any case, since the Privacy Shield really aims to free up business access to data, sadly it's unlikely to do much good for true privacy.
The examples cited here are special cases of the collision of Big Data with data privacy, which is one of my special interest areas at Constellation Research. See for example "Big Privacy" Rises to the Challenges of Big Data.
A big part of my research agenda in the Digital Safety theme at Constellation is privacy. And what a vexed topic it is! It's hard to even know how to talk about privacy. For many years, folks have covered privacy in more or less academic terms, drawing on sociology, politics and pop psychology, joining privacy to human rights, and crafting new various legal models.
Meanwhile the data breaches get worse, and most businesses have just bumped along.
When you think about it, it’s obvious really: there’s no such thing as perfect privacy. The real question is not about ‘fundamental human rights’ versus business, but rather, how can we optimise a swarm of competing interests around the value of information?
Privacy is emerging as one of the most critical and strategic of our information assets. If we treat privacy as an asset, instead of a burden, businesses can start to cut through this tough topic.
But here’s an urgent issue. A recent regulatory development means privacy may just stop a lot of business getting done. It's the European Court of Justice decision to shut down the US-EU Safe Harbor arrangement.
The privacy Safe Harbor was a work-around negotiated by the Federal Trade Commission, allowing companies to send personal data from Europe into the US.
But the Safe Harbor is no more. It's been ruled unlawful. So it’s a big, big problem for European operations, many multinationals, and especially US cloud service providers.
At Constellation we've researched cloud geography and previously identified competitive opportunities for service providers to differentiate and compete on privacy. But now this is an urgent issue.
It's time American businesses stopped getting caught out by global privacy rulings. There shouldn't be too many surprises here, if you understand what data protection means internationally. Even the infamous "Right To Be Forgotten" ruling on Google’s search engine – which strikes so many technologists as counter intuitive – was a rational and even predictable outcome of decades old data privacy law.
The leading edge of privacy is all about Big Data. And we aint seen nothin yet!
Look at artificial intelligence, Watson Health, intelligent personal assistants, hackable cars, and the Internet of Everything where everything is instrumented, and you see information assets multiplying exponentially. Privacy is actually just one part of this. It’s another dimension of information, one that can add value, but not in a neat linear way. The interplay of privacy, utility, usability, efficiency, efficacy, security, scalability and so on is incredibly complex.
The broader issue is Digital Safety: safety for your customers, and safety for your business.
The identerati sometimes refer to the challenge of “binding carbon to silicon”. That’s a poetic way of describing how the field of Identity and Access Management (IDAM) is concerned with associating carbon-based life forms (as geeks fondly refer to people) with computers (or silicon chips).
To securely bind users’ identities or attributes to their computerised activities is indeed a technical challenge. In most conventional IDAM systems, there is only circumstantial evidence of who did what and when, in the form of access logs and audit trails, most of which can be tampered with or counterfeited by a sufficiently determined fraudster. To create a lasting, tamper-resistant impression of what people do online requires some sophisticated technology (in particular, digital signatures created using hardware-based cryptography).
On the other hand, working out looser associations between people and computers is the stock-in-trade of social networking operators and Big Data analysts. So many signals are emitted as a side effect of routine information processing today that even the shyest of users may be uncovered by third parties with sufficient analytics know-how and access to data.
So privacy is in peril. For the past two years, big data breaches have only got bigger: witness the losses at Target (110 million), EBay (145 million), Home Depot (109 million records) and JPMorgan Chase (83 million) to name a few. Breaches have got deeper, too. Most notably, in June 2015 the U.S. federal government’s Office of Personnel Management (OPM) revealed it had been hacked, with the loss of detailed background profiles on 15 million past and present employees.
I see a terrible systemic weakness in the standard practice of information security. Look at the OPM breach: what was going on that led to application forms for employees dating back 15 years remaining in a database accessible from the Internet? What was the real need for this availability? Instead of relying on firewalls and access policies to protect valuable data from attack, enterprises need to review which data needs to be online at all.
We urgently need to reduce the exposed attack surface of our information assets. But in the information age, the default has become to make data as available as possible. This liberality is driven both by the convenience of having all possible data on hand, just in case in it might be handy one day, and by the plummeting cost of mass storage. But it's also the result of a technocratic culture that knows "knowledge is power," and gorges on data.
In communications theory, Metcalfe’s Law states that the value of a network is proportional to the square of the number of devices that are connected. This is an objective mathematical reality, but technocrats have transformed it into a moral imperative. Many think it axiomatic that good things come automatically from inter-connection and information sharing; that is, the more connection the better. Openness is an unexamined rallying call for both technology and society. “Publicness” advocate Jeff Jarvis wrote (admittedly provocatively) that: “The more public society is, the safer it is”. And so a sort of forced promiscuity is shaping up as the norm on the Internet of Things. We can call it "superconnectivity", with a nod to the special state of matter where electrical resistance drops to zero.
In thinking about privacy on the IoT, a key question is this: how much of the data emitted from Internet-enabled devices will actually be personal data? If great care is not taken in the design of these systems, the unfortunate answer will be most of it.
My latest investigation into IoT privacy uses the example of the Internet connected motor car. "Rationing Identity on the Internet of Things" will be released soon by Constellation Research.
And don't forget Constellation's annual innovation summit, Connected Enterprise at Half Moon Bay outside San Francisco, November 4th-6th. Early bird registration closes soon.
An unpublished letter to New Yorker magazine, August 2015.
Kelefa Sanneh ("The Hell You Say", Aug 10 & 17) poses a question close to the heart of society’s analog-to-digital conversion: What is speech?
Internet policy makers worldwide are struggling with a recent European Court of Justice decision which grants some rights to individuals to have search engines like Google block results that are inaccurate, irrelevant or out of date. Colloquially known as the "Right To Be Forgotten" (RTBF), the ruling has raised the ire of many Americans in particular, who typically frame it as yet another attack on free speech. Better defined as a right to be de-listed, RTBF makes search providers consider the impact on individuals of search algorithms, alongside their commercial interests. For there should be no doubt – search is very big business. Google and its competitors use search to get to know people, so they can sell better advertising.
Search results are categorically not the sort of text which contributes to "democratic deliberation". Free speech may be many things but surely not the mechanical by-products of advertising processes. To protect search results as such mocks the First Amendment.
Some of my other RTBF thoughts:
- Search is not a passive reproduction; Google makes the public domain public.
- Google's deeply divided Advisory Council was strangely silent on the business nature of search.
- Search results are a special form of Big Data, and not the sort of thing that counts as speech.
In the latest course of a 15 month security feast, BlackBerry has announced it is acquiring mobile device management (MDM) provider Good Technology. The deal is said to be definitive, for US$425 million in cash.
As BlackBerry boldly re-positions itself as a managed service play in the Internet of Things, adding an established MDM capability to its portfolio will bolster its claim -- which still surprises many -- to be handset neutral. But the Good buy is much more than that. It has to be seen in the context of John Chen's drive for cross-sector security and privacy infrastructure for the IoT.
As I reported from the recent BlackBerry Security Summit in New York, the company has knitted together a comprehensive IoT security fabric. Look at how they paint their security platform:
And see how Good will slip neatly into the Platform Services column. It's the latest in what is now a $575 million investment in non-organic security growth (following purchases of Secusmart, Watchdox, Movirtu and Athoc).
According to BlackBerry,
- Good will bring complementary capabilities and technologies to BlackBerry, including secure applications and containerization that protects end user privacy. With Good, BlackBerry will expand its ability to offer cross-platform EMM solutions that are critical in a world with varying deployment models such as bring-your-own-device (BYOD); corporate owned, personally enabled (COPE); as well as environments with multiple user interfaces and operating systems. Good has expertise in multi-OS management with 64 percent of activations from iOS devices, followed by a broad Android and Windows customer base.(1) This experience combined with BlackBerry’s strength in BlackBerry 10 and Android management – including Samsung KNOX-enabled devices – will provide customers with increased choice for securely deploying any leading operating system in their organization.
The strategic acquisition of Good Technology will also give the Identity-as-a-Service sector a big kick. IDaaS is become a crowded space with at least ten vendors (CA, Centrify, IBM, Microsoft, Okta, OneLogin, Ping, Salepoint, Salesforce, VMware) competing strongly around a pretty well settled set of features and functions. BlackBerry themselves launched an IDaaS a few months ago. At the Security Summit, I asked their COO Marty Beard what is going to distinguishe their offering in such a tight market, and he said, simply, mobility. Presto!
But IDaaS is set to pivot. We all know that mobility is now the locus of security , and we've seen VMware parlay its AirWatch investment into a competitive new cloud identity service. This must be more than a catch-up play with so many entrenched IDaaS vendors.
Here's the thing. I foresee identity actually disappearing from the user experience, which more and more will just be about the apps. I discussed this development in a really fun "Identity Innovators" video interview recorded with Ping at the recent Cloud Identity Summit. For identity to become seamless with the mobile application UX, we need two things. Firstly, federation protocols so that different pieces of software can hand over attributes and authentication signals to one another, and these are all in place now. But secondly we also need fully automated mobile device management as a service, and that's where Good truly fits with the growing BlackBerry platform.
Now stay tuned for new research coming soon via Constellation on the Internet of Things, identity, privacy and software reliability.
See also The State of Identity Management in 2015.
On July 23, BlackBerry hosted its second annual Security Summit, once again in New York City. As with last year’s event, this was a relatively intimate gathering of analysts and IT journalists, brought together for the lowdown on BlackBerry’s security and privacy vision.
By his own account, CEO John Chen has met plenty of scepticism over his diverse and, some say, chaotic product and services portfolio. And yet it’s beginning to make sense. There is a strong credible thread running through Chen’s initiatives. It all has to do with the Internet of Things.
Disclosure: I traveled to the Blackberry Security Summit as a guest of Blackberry, which covered my transport and accommodation.
The Growth Continues
In 2014, John Chen opened the show with the announcement he was buying the German voice encryption firm Secusmart. That acquisition appears to have gone well for all concerned; they say nobody has left the new organisation in the 12 months since. News of BlackBerry’s latest purchase - of crisis communications platform AtHoc - broke a few days before this year’s Summit, and it was only the most recent addition to the family. In the past 12 months, BlackBerry has been busy spending $150M on inorganic growth, picking up:
Chen has also overseen an additional $100M expenditure in the same timeframe on organic security expansion (over and above baseline product development). Amongst other things BlackBerry has:
The Growth Explained - Secure Mobile Communications
Executives from different business units and different technology horizontals all organised their presentations around what is now a comprehensive security product and services matrix. It looks like this (before adding AtHoc):
BlackBerry is striving to lead in Secure Mobile Communications. In that context the highlights of the Security Summit for mine were as follows.
The Internet of Things
BlackBerry’s special play is in the Internet of Things. It’s the consistent theme that runs through all their security investments, because as COO Marty Beard says, IoT involves a lot more than machine-to-machine communications. It’s more about how to extract meaningful data from unbelievable numbers of devices, with security and privacy. That is, IoT for BlackBerry is really a security-as-a-service play.
Chief Security Officer David Kleidermacher repeatedly stressed the looming challenge of “how to patch and upgrade devices at scale”.
- MyPOV: Functional upgrades for smart devices will of course be part and parcel of IoT, but at the same time, we need to work much harder to significantly reduce the need for reactive security patches. I foresee an angry consumer revolt if things that never were computers start to behave and fail like computers. A radically higher standard of quality and reliability is required. Just look at the Jeep Uconnect debacle, where it appears Chrysler eventually thought better of foisting a patch on car owners and instead opted for a much more expensive vehicle recall. It was BlackBerry’s commitment to ultra high reliability software that really caught my attention at the 2014 Security Summit, and it convinces me they grasp what’s going to be required to make ubiquitous computing properly seamless.
Refreshingly, COO Beard preferred to talk about economic value of the IoT, rather than the bazillions of devices we are all getting a little jaded about. He said the IoT would bring about $4 trillion of required technology within a decade, and that the global economic impact could be $11 trillion.
BlackBerry’s real time operating system QNX is in 50 million cars today.
AtHoc is a secure crisis communications service, with its roots in the first responder environment. It’s used by three million U.S. government workers today, and the company is now pushing into healthcare.
Founder and CEO Guy Miasnik explained that emergency communications involves more than just outbound alerts to people dealing with disasters. Critical to crisis management is the secure inbound collection of info from remote users. AtHoc is also not just about data transmission (as important as that is) but it works also at the application layer, enabling sophisticated workflow management. This allows procedures for example to be defined for certain events, guiding sets of users and devices through expected responses, escalating issues if things don’t get done as expected.
We heard more about BlackBerry’s collaboration with Oxford University on the Centre for High Assurance Computing Excellence, first announced in April at the RSA Conference. CHACE is concerned with a range of fundamental topics, including formal methods for verifying program correctness (an objective that resonates with BlackBerry’s secure operating system division QNX) and new security certification methodologies, with technical approaches based on the Common Criteria of ISO 15408 but with more agile administration to reduce that standard’s overhead and infamous rigidity.
CSO Kleidermacher announced that CHACE will work with the Diabetes Technology Society on a new healthcare security standards initiative. The need for improved medical device security was brought home vividly by an enthralling live demonstration of hacking a hospital drug infusion pump. These vulnerabilities have been exposed before at hacker conferences but BlackBerry’s demo was especially clear and informative, and crafted for a non-technical executive audience.
- MyPOV: The message needs to be broadcast loud and clear: there are life-critical machines in widespread use, built on commercial computing platforms, without any careful thought for security. It’s a shameful and intolerable situation.
I was impressed by BlackBerry’s privacy line. It's broader and more sophisticated than most security companies, going way beyond the obvious matters of encryption and VPNs. In particular, the firm champions identity plurality. For instance, WorkLife by BlackBerry, powered by Movirtu technology, realizes multiple identities on a single phone. BlackBerry is promoting this capability in the health sector especially, where there is rarely a clean separation of work and life for professionals. Chen said he wants to “separate work and private life”.
The health sector in general is one of the company’s two biggest business development priorities (the other being automotive). In addition to sophisticated telephony like virtual SIMs, they plan to extend extend AtHoc into healthcare messaging, and have tasked the CHACE think-tank with medical device security. These actions complement BlackBerry’s fine words about privacy.
So BlackBerry’s acquisition plan has gelled. It now has perhaps the best secure real time OS for smart devices, a hardened device-independent Mobile Device Management backbone, new data-centric privacy and rights management technology, remote certificate management, and multi-layered emergency communications services that can be diffused into mission-critical rules-based e-health settings and, eventually, automated M2M messaging. It’s a powerful portfolio that makes strong sense in the Internet of Things.
BlackBerry says IoT is 'much more than device-to-device'. It’s more important to be able to manage secure data being ejected from ubiquitous devices in enormous volumes, and to service those things – and their users – seamlessly. For BlackBerry, the Internet of Things is really all about the service.
Every year the Constellation SuperNova Awards recognise eight individuals for their leadership in digital business. Nominate yourself or someone you know by August 7, 2015.
The SuperNova Awards honour leaders that demonstrate excellence in the application and adoption of new and emerging technologies. In its fifth year, the SuperNova Awards program will recognise eight individuals who demonstrate true leadership in digital business through their application of new and emerging technologies. Constellation Research is searching for leaders and corporate teams who have innovatively applied disruptive technolgies to their businesses, to adapt to the rapidly-changing digital business environment. Special emphasis will be given to projects that seek to redefine how the enterprise uses technology on a large scale.
We’re searching for the boldest, most transformative technology projects out there. Apply for a SuperNova Award by filling out the application here: http://www.constellationr.com/node/3137/apply
SuperNova Award Categories
- Consumerization of IT & The New C-Suite - The Enterprise embraces consumer tech, and perfects it.
- Data to Decisions - Using data to make informed business decisions.
- Digital Marketing Transformation - Put away that megaphone. Marketing in the digital age requires a new approach.
- Future of Work - The processes and technologies addressing the rapidly shifting work paradigm.
- Matrix Commerce - Commerce responds to changing realities from the supply chain to the storefront.
- Next Generation Customer Experience - Customers in the digital age demand seamless service throughout all lifecycle stages and across all channels.
- Safety and Privacy - Not 'security'. Safety and Privacy is the art and science of the art and science of protecting information assets, including your most important assets: your people.
- Technology Optimization & Innovation - Innovative methods to balance innovation and budget requirements.
Five reasons to apply for a SuperNova Award
- Exposure to the SuperNova Award judges, comprised of the top influencers in enterprise technology
- Case study highlighting the achievements of the winners written by Constellation analysts
- Complimentary admission to the SuperNova Award Gala Dinner and Constellation's Connected Enterprise for all finalists November 4-6, 2015 (NB: lodging and travel not included)
- One year unlimited access to Constellation's research library
- Winners featured on Constellation's blog and weekly newsletter.
Ray Wang tells us now that writing a book and launching a company are incredibly fulfilling things to do - but ideally, not at the same time. He thought it would take a year to write "Disrupting Digital Business", but since it overlapped with building Constellation Research, it took three! But at the same time, his book is all the richer for that experience.
Ray is on a world-wide book tour (tweeting under the hash tag #cxotour). I was thrilled to participate in the Melbourne leg last week. We convened a dinner at Melbourne restaurant The Deck and were joined by a good cross section of Australian private and public sector businesses. There were current and recent executives from Energy Australia, Rio Tinto, the Victorian Government and Australia Post among others, plus the founders of several exciting local start-ups. And we were lucky to have special guests Brian Katz and Ben Robbins - two renowned mobility gurus.
The format for all the launch events has one or two topical short speeches from Constellation analysts and Associates, and a fireside chat by Ray. In Melbourne, we were joined by two of Australia's deep digital economy experts, Gavin Heaton and Joanne Jacobs. Gavin got us going on the night, surveying the importance of innovation, and the double edged opportunities and threats of digital disruption.
Then Ray spoke off-the-cuff about his book, summarising years of technology research and analysis, and the a great many cases of business disruption, old and new. Ray has an encyclopedic grasp of tech-driven successes and failures going back decades, yet his presentations are always up-to-the-minute and full of practical can-do calls to action. He's hugely engaging, and having him on a small stage for a change lets him have a real conversation with the audience.
Speaking with no notes and PowerPoint-free, Ray ranged across all sorts of disruptions in all sorts of sectors, including:
- Sony's double cassette Walkman (which Ray argues playfully was their "last innovation")
- Coca Cola going digital, and the speculative "ten cent sip"
- the real lesson of the iPhone: geeks spend time arguing about whether Apple's technology is original or appropriated, when the point is their phone disrupted 20 or more other business models
- the contrasting Boeing 787 Dreamliner and Airbus A380 mega jumbo - radically different ways to maximise the one thing that matters to airlines: dollars per passenger-miles, and
- Uber, which observers don't always fully comprehend as a rich mix of mobility, cloud and Big Data.
And I closed the scheduled part of the evening with a provocation on privacy. I asked the group to think about what it means to call any online business practice "creepy". Have community norms and standards really changed in the move online? What's worse: government surveillance for political ends, or private sector surveillance for profit? If we pay for free online services with our personal information, do regular consumers understand the bargain? And if cynics have been asking "Is Privacy Dead?" for over 100 years, doesn't it mean the question is purely rhetorical? Who amongst us truly wants privacy to be over?!
The discussion quickly attained a life of its own - muscular, but civilized. And it provided ample proof that whatever you think about privacy, it is complicated and surprising, and definitely disruptive! (For people who want to dig further into the paradoxes of modern digital privacy, Ray and I recently recorded a nice long chat about it).
Here are some of the Digital Disruption tour dates coming up:
Acknowledgement: Daniel Barth-Jones kindly engaged with me after this blog was initially published, and pointed out several significant factual errors, for which I am grateful.
In 2014, the New York Taxi & Limousine Company (TLC) released a large "anonymised" dataset containing 173 million taxi rides taken in 2013. Soon after, software developer Vijay Pandurangan managed to undo the hashed taxi registration numbers. Subsequently, privacy researcher Anthony Tockar went on to combine public photos of celebrities getting in or out of cabs, to recreate their trips. See Anna Johnston's analysis here.
This re-identification demonstration has been used by some to bolster a general claim that anonymity online is increasingly impossible.
On the other hand, medical research advocates like Columbia University epidemiologist Daniel Barth-Jones argue that the practice of de-identification can be robust and should not be dismissed as impractical on the basis of demonstrations such as this. The identifiability of celebrities in these sorts of datasets is a statistical anomaly reasons Barth-Jones and should not be used to frighten regular people out of participating in medical research on anonymised data. He wrote in a blog that:
- "However, it would hopefully be clear that examining a miniscule proportion of cases from a population of 173 million rides couldn’t possibly form any meaningful basis of evidence for broad assertions about the risks that taxi-riders might face from such a data release (at least with the taxi medallion/license data removed as will now be the practice for FOIL request data)."
As a health researcher, Barth-Jones is understandably worried that re-identification of small proportions of special cases is being used to exaggerate the risks to ordinary people. He says that the HIPAA de-identification protocols if properly applied leave no significant risk of re-id. But even if that's the case, HIPAA processes are not applied to data across the board. The TLC data was described as "de-identified" and the fact that any people at all (even stand-out celebrities) could be re-identified from data does create a broad basis for concern - "de-identified" is not what it seems. Barth-Jones stresses that in the TLC case, the de-identification was fatally flawed [technically: it's no use hashing data like registration numbers with limited value ranges because the hashed values can be reversed by brute force] but my point is this: who among us who can tell the difference between poorly de-identified and "properly" de-identified?
And how long can "properly de-identified" last? What does it mean to say casually that only a "minuscule proportion" of data can be re-identified? In this case, the re-identification of celebrities was helped by the fact lots of photos of them are readily available on social media, yet there are so many photos in the public domain now, regular people are going to get easier to be identified.
But my purpose here is not to play what-if games, and I know Daniel advocates statistically rigorous measures of identifiability. We agree on that -- in fact, over the years, we have agreed on most things. The point I am trying to make in this blog post is that, just as nobody should exaggerate the risk of re-identification, nor should anyone play it down. Claims of de-identification are made almost daily for really crucial datasets, like compulsorily retained metadata, public health data, biometric templates, social media activity used for advertising, and web searches. Some of these claims are made with statistical rigor, using formal standards like the HIPAA protocols; but other times the claim is casual, made with no qualification, with the aim of comforting end users.
"De-identified" is a helluva promise to make, with far-reaching ramifications. Daniel says de-identification researchers use the term with caution, knowing there are technical qualifications around the finite probability of individuals remaining identifiable. But my position is that the fine print doesn't translate to the general public who only hear that a database is "anonymous". So I am afraid the term "de-identified" is meaningless outside academia, and in casual use is misleading.
Barth-Jones objects to the conclusion that "it's virtually impossible to anonymise large data sets" but in an absolute sense, that claim is surely true. If any proportion of people in a dataset may be identified, then that data set is plainly not "anonymous". Moreover, as statistics and mathematical techniques (like facial recognition) improve, and as more ancillary datasets (like social media photos) become accessible, the proportion of individuals who may be re-identified will keep going up.[Readers who wish to pursue these matters further should look at the recent Harvard Law School online symposium on "Re-identification Demonstrations", hosted by Michelle Meyer, in which Daniel Barth-Jones and I participated, among many others.]
Both sides of this vexed debate need more nuance. Privacy advocates have no wish to quell medical research per se, nor do they call for absolute privacy guarantees, but we do seek full disclosure of the risks, so that the cost-benefit equation is understood by all. One of the obvious lessons in all this is that "anonymous" or "de-identified" on their own are poor descriptions. We need tools that meaningfully describe the probability of re-identification. If statisticians and medical researchers take "de-identified" to mean "there is an acceptably small probability, namely X percent, of identification" then let's have that fine print. Absent the detail, lay people can be forgiven for thinking re-identification isn't going to happen. Period.
And we need policy and regulatory mechanisms to curb inappropriate re-identification. Anonymity is a brittle, essentially temporary, and inadequate privacy tool.
I argue that the act of re-identification ought to be treated as an act of Algorithmic Collection of PII, and regulated as just another type of collection, albeit an indirect one. If a statistical process results in a person's name being added to a hitherto anonymous record in a database, it is as if the data custodian went to a third party and asked them "do you know the name of the person this record is about?". The fact that the data custodian was clever enough to avoid having to ask anyone about the identity of people in the re-identified dataset does not alter the privacy responsibilities arising. If the effect of an action is to convert anonymous data into personally identifiable information (PII), then that action collects PII. And in most places around the world, any collection of PII automatically falls under privacy regulations.
It looks like we will never guarantee anonymity, but the good news is that for privacy, we don't actually need to. Privacy is the protection you need when you affairs are not anonymous, for privacy is a regulated state where organisations that have knowledge about you are restrained in what they do with it. Equally, the ability to de-anonymise should be restricted in accordance with orthodox privacy regulations. If a party chooses to re-identify people in an ostensibly de-identified dataset, without a good reason and without consent, then that party may be in breach of data privacy laws, just as they would be if they collected the same PII by conventional means like questionnaires or surveillance.
Surely we can all agree that re-identification demonstrations serve to shine a light on the comforting claims made by governments for instance that certain citizen datasets can be anonymised. In Australia, the government is now implementing telecommunications metadata retention laws, in the interests of national security; the metadata we are told is de-identified and "secure". In the UK, the National Health Service plans to make de-identified patient data available to researchers. Whatever the merits of data mining in diverse fields like law enforcement and medical research, my point is that any government's claims of anonymisation must be treated critically (if not skeptically), and subjected to strenuous and ongoing privacy impact assessment.
Privacy, like security, can never be perfect. Privacy advocates must avoid giving the impression that they seek unrealistic guarantees of anonymity. There must be more to privacy than identity obscuration (to use a more technically correct term than "de-identification"). Medical research should proceed on the basis of reasonable risks being taken in return for beneficial outcomes, with strong sanctions against abuses including unwarranted re-identification. And then there wouldn't need to be a moral panic over re-identification if and when it does occur, because anonymity, while highly desirable, is not essential for privacy in any case.