In May 2014, the European Court of Justice (ECJ) ruled that under European law, people have the right to have certain information about them delisted from search engine results. The ECJ ruling was called the "Right to be Forgotten", despite it having little to do with forgetting (c'est la vie). Shortened as RTBF, it is also referred to more clinically as the "Right to be Delisted" (or simply as "Google Spain" because that was one of the parties in the court action). Within just a few months, the RTBF has triggered conferences, public debates, and a TEDx talk.
Google itself did two things very quickly in response to the RTBF ruling. First, it mobilised a major team to process delisting requests. This is no mean feat -- over 200,000 requests have been received to date; see Google's transparency report. However it's not surprising they got going so quickly as they already have well-practiced processes for take-down notices for copyright and unlawful material.
Secondly, the company convened an Advisory Council of independent experts to formulate strategies for balancing the competing rights and interests bound up in RTBF. The Advisory Council delivered its report in January; it's available online here.
I declare I'm a strong supporter of RTBF. I've written about it here and here, and participated in an IEEE online seminar. I was impressed by the intellectual and eclectic make-up of the Council, which includes a past European Justice Minister, law professors, and a philosopher. And I do appreciate that the issues are highly complex. So I had high expectations of the Council's report.
Yet I found it quite barren.
Recap - the basics of RTBF
EU Justice Commissioner Martine Reicherts in a speech last August gave a clear explanation of the scope of the ECJ ruling, and acknowledged its nuances. Her speech should be required reading. Reicherts summed up the situation thus:
- What did the Court actually say on the right to be forgotten? It said that individuals have the right to ask companies operating search engines to remove links with personal information about them – under certain conditions - when information is inaccurate, inadequate, irrelevant, outdated or excessive for the purposes of data processing. The Court explicitly ruled that the right to be forgotten is not absolute, but that it will always need to be balanced against other fundamental rights, such as the freedom of expression and the freedom of the media – which, by the way, are not absolute rights either.
Everyone concerned acknowledges there are tensions in the RTBF ruling. The Google Advisory Council Report mentions these tensions (in Section 3) but sadly spends no time critically exploring them. In truth, all privacy involves conflicting requirements, and to that extent, many features of RTBF have been seen before. At p5, the Report mentions that "the [RTBF] Ruling invokes a data subject’s right to object to, and require cessation of, the processing of data about himself or herself" (emphasis added); the reader may conclude, as I have, that the computing of search results by a search engine is just another form of data processing.
One of the most important RTBF talking points is whether it's fair that Google is made to adjudicate delisting requests. I have some sympathies for Google here, and yet this is not an entirely novel situation in privacy. A standard feature of international principles-based privacy regimes is the right of individuals to have erroneous personal data corrected (this is, for example, OECD Privacy Principle No. 7 - Individual Participation, and Australian Privacy Principle No. 13 - Correction of Personal Information). And at the top of p5, the Council Report cites the right to have errors rectified. So it is standard practice that a data custodian must have means for processing access and correction requests. Privacy regimes expect there to be dispute resolution mechanisms too, operated by the company concerned. None of this is new. What seems to be new to some stakeholders is the idea that the results of a search engine is just another type of data processing.
A little rushed
The Council explains in the Introduction to the Report that it had to work "on an accelerated timeline, given the urgency with which Google had to begin complying with the Ruling once handed down". I am afraid that the Report shows signs of being a little rushed.
- There are several spelling errors.
- The contributions from non English speakers could have done with some editing.
- Less trivially, many of the footnotes need editing; it's not always clear how a person's footnoted quote supports the text.
- More importantly, the Advisory Council surely operated with Terms of Reference, yet there is no clear explanation of what those were. At the end of the introduction, we're told the group was "convened to advise on criteria that Google should use in striking a balance, such as what role the data subject plays in public life, or whether the information is outdated or no longer relevant. We also considered the best process and inputs to Google’s decision making, including input from the original publishers of information at issue, as potentially important aspects of the balancing exercise." I'm surprised there is not a more complete and definitive description of the mission.
- It's not actually clear what sort of search we're all talking about. Not until p7 of the Report does the qualified phrase "name-based search" first appear. Are there other types of search for which the RTBF does not apply?
- Above all, it's not clear that the Council has reached a proper conclusion. The Report makes a number of suggestions in passing, and there is a collection of "ideas" at the back for improving the adjudication process, but there is no cogent set of recommendations. That may be because the Council didn't actually reach consensus.
And that's one of the most surprising things about the whole exercise. Of the eight independent Council members, five of them wrote "dissenting opinions". The work of an expert advisory committee is not normally framed as a court-like determination, from which members might dissent. And even if it was, to have the majority of members "dissent" casts doubt on the completeness or even the constitution of the process. Is there anything definite to be dissented from?
Jimmy Wales, the Wikipedia founder and chair, was especially strident in his individual views at the back of the Report. He referred to "publishers whose works are being suppressed" (p27 of the Report), and railed against the report itself, calling its recommendation "deeply flawed due to the law itself being deeply flawed". Can he mean the entire Charter of Fundamental Rights of the EU and European Convention on Human Rights? Perhaps Wales is the sort of person that denies there are any nuances in privacy, because "suppressed" is an exaggeration if we accept that RTBF doesn't cause anything to be forgotten. In my view, it poisons the entire effort when unqualified insults are allowed to be hurled at the law. If Wales thinks so little of the foundation of both the ECJ ruling and the Advisory Council, he might have declined to take part.
A little hollow
Strangely, the Council's Report is altogether silent on the nature of search. It's such a huge part of their business that I have to think the strength of Google's objection to RTBF is energised by some threat it perceives to its famously secret algorithms.
The Google business was founded on its superior Page Rank search method, and the company has spent fantastic funds on R&D, allowing it to keep a competitive edge for a very long time. And the R&D continues. Curiously, just as everyone is debating RTBF, Google researchers published a paper about a new "knowledge based" approach to evaluating web pages. Surely if page ranking was less arbitrary and more transparent, a lot of the heat would come out of RTBF.
Of all the interests to balance in RTBF, Google's business objectives are actually a legitimate part of the mix. Google provides marvelous free services in exchange for data about its users which it converts into revenue, predominantly through advertising. It's a value exchange, and it need not be bad for privacy. A key component of privacy is transparency: people have a right to know what personal information about them is collected, and why. The RTBF analysis seems a little hollow without frank discussion of what everyone gets out of running a search engine.
- The European Court of Justice Right to be Forgotten ruling.
- Academic commentary compiled by Julia Powles and Rebekah Larsen from Cambridge University.
- "Knowledge-Based Trust: Estimating the Trustworthiness of Web Sources" by Xin Luna Dong et al, Google.
- My analysis of search and free speech, "Free search, a misnomer"
- My analysis of search as a Big Data process, "The rite to be forgotten".
Search engines are wondrous things. I myself use Google search umpteen times a day. I don't think I could work or play without it anymore. And yet I am a strong supporter of the contentious "Right to be Forgotten". The "RTBF" is hotly contested, and I am the first to admit it's a messy business. For one thing, it's not ideal that Google itself is required for now to adjudicate RTBF requests in Europe. But we have to accept that all of privacy is contestable. The balance of rights to privacy and rights to access information is tricky. RTBF has a long way to go, and I sense that European jurors and regulators are open and honest about this.
One of the starkest RTBF debating points is free speech. Does allowing individuals to have irrelevant, inaccurate and/or outdated search results blocked represent censorship? Is it an assault on free speech? There is surely a technical-legal question about whether the output of an algorithm represents "free speech", and as far as I can see, that question remains open. Am I the only commentator suprised by this legal blind spot? I have to say that such uncertainty destabilises a great deal of the RTBF dispute.
I am not a lawyer, but I have a strong sense that search outputs are not the sort of thing that constitutes speech. Let's bear in mind what web search is all about.
Google search is core to its multi-billion dollar advertising business. Search results are not unfiltered replicas of things found in the public domain, but rather the subtle outcome of complex Big Data processes. Google's proprietary search algorithm is famously secret, but we do know how sensitive it is to context. Most people will have noticed that search results change day by day and from place to place. But why is this?
When we enter search parameters, the result we get is actually Google's guess about what we are really looking for. Google in effect forms a hypothesis, drawing on much more than the express parameters, including our search history, browsing history, location and so on. And in all likelihood, search is influenced by the many other things Google gleans from the way we use its other properties -- gmail, maps, YouTube, hangouts and Google+ -- which are all linked now under one master data usage policy.
And here's the really clever thing about search. Google monitors how well it's predicting our real or underlying concerns. It uses a range of signals and metrics, to assess what we do with search results, and it continuously refines those processes. This is what Google really gets out of search: deep understanding of what its users are interested in, and how they are likely to respond to targeted advertising. Each search result is a little test of Google's Artificial Intelligence, which, as some like to say, is getting to know us better than we know ourselves.
As important as they are, it seems to me that search results are really just a by-product of a gigantic information business. They are nothing like free speech.
It would be naive to expect the White House Cybersecurity Summit to have been less political. President Obama and his colleagues were in their comfort zone, talking up America's recent economic turnaround, and framing their recent wins squarely within Silicon Valley where the summit took place. With a few exceptions, the first two hours was more about green energy, jobs and manufacturing than cyber security. It was a lot like a lost episode of The West Wing.
The exceptions were important. Some speakers really nailed some security issues. I especially liked the morning contributions from Intel President Renee James and MasterCard CEO Ajay Banga. James highlighted that Intel has worked for 10 years to improve "the baseline of computing security", making her one of the few speakers to get anywhere near the inherent insecurity of our cyber infrastructure. The truth is that cyberspace is built on weak foundations; the software development practices and operating systems that bear the economy today were not built for the job. For mine, the Summit was too much about military/intelligence themed information sharing, and not enough about why our systems are so precarious. I know it's a dry subject but if they're serious about security, policy makers really have to engage with software quality and reliability, instead of thrilling to kids learning to code. Software development practices are to blame for many of our problems; more on software failures here.
Ajay Banga was one of several speakers to urge the end of passwords. He summed up the authentication problem very nicely: "Stop making us remember things in order to prove who we are". He touched on MasterCard's exploration of continuous authentication bracelets and biometrics (more news of which coincidentally came out today). It's important however that policy makers' understanding of digital infrastructure resilience, cybercrime and cyber terrorism isn't skewed by everyone's favourite security topic - customer authentication. Yes, it's in need of repair, yet authentication is not to blame for the vast majority of breaches. Mom and Pop struggle with passwords and they deserve better, but the vast majority of stolen personal data is lifted by organised criminals en masse from poorly secured back-end databases. Replacing customer passwords or giving everyone biometrics is not going to solve the breach epidemic.
Banga also indicated that the Information Highway should be more like road infrastructure. He highlighted that national routes are regulated, drivers are licensed, there are rules of the road, standardised signs, and enforcement. All these infrastructure arrangements leave plenty of room for innovation in car design, but it's accepted that "all cars have four wheels".
Tim Cook was then the warm-up act before Obama. Many on Twitter unkindly branded Cook's speech as an ad for Apple, paid for by the White House, but I'll accentuate the positives. Cook continues to campaign against business models that monetize personal data. He repeated his promise made after the ApplePay launch that they will not exploit the data they have on their customers. He put privacy before security in everything he said.
Cook painted a vision where digital wallets hold your passport, driver license and other personal documents, under the user's sole control, and without trading security for convenience. I trust that he's got the mobile phone Secure Element in mind; until we can sort out cybersecurity at large, I can't support the counter trend towards cloud-based wallets. The world's strongest banks still can't guarantee to keep credit card numbers safe, so we're hardly ready to put our entire identities in the cloud.
In his speech, President Obama reiterated his recent legislative agenda for information sharing, uniform breach notification, student digital privacy, and a Consumer Privacy Bill of Rights. He stressed the need for private-public partnership and cybersecurity responsibility to be shared between government and business. He reiterated the new Cyber Threat Intelligence Integration Center. And as flagged just before the summit, the president signed an Executive Order that will establish cyber threat information sharing "hubs" and standards to foster sharing while protecting privacy.
Obama told the audience that cybersecurity "is not an ideological issue". Of course that message was actually for Congress which is deliberating over his cyber legislation. But let's take a moment to think about how ideology really does permeate this arena. Three quasi-religious disputes come to mind immediately:
- Free speech trumps privacy. The ideals of free speech have been interpreted in the US in such a way that makes broad-based privacy law intractable. The US is one of only two major nations now without a general data protection statute (the other is China). It seems this impasse is rarely questioned anymore by either side of the privacy debate, but perhaps the scope of the First Amendment has been allowed to creep out too far, for now free speech rights are in effect being granted even to computers. Look at the controversy over the "Right to be Forgotten" (RTBF), where Google is being asked to remove certain personal search results if they are irrelevant, old and inaccurate. Jimmy Wales claims this requirement harms "our most fundamental rights of expression and privacy". But we're not talking about speech here, or even historical records, but rather the output of a computer algorithm, and a secret algorithm at that, operated in the service of an advertising business. The vociferous attacks on RTBF are very ideological indeed.
- "Innovation" trumps privacy. It's become an unexamined mantra that digital businesses require unfettered access to information. I don't dispute that some of the world's richest ever men, and some of the world's most powerful ever corporations have relied upon the raw data that exudes from the Internet. It's just like the riches uncovered by the black gold rush on the 1800s. But it's an ideological jump to extrapolate that all cyber innovation or digital entrepreneurship must continue the same way. Rampant data mining is laying waste to consumer confidence and trust in the Internet. Some reasonable degree of consumer rights regulation seems inevitable, and just, if we are to avert a digital Tragedy of the Commons.
- National Security trumps privacy. I am a rare privacy advocate who actually agrees that the privacy-security equilibrium needs to be adjusted. I believe the world has changed since some of our foundational values were codified, and civil liberties are just one desirable property of a very complicated social system. However, I call out one dimensional ideology when national security enthusiasts assert that privacy has to take a back seat. There are ways to explore a measured re-calibration of privacy, to maintain proportionality, respect and trust.
President Obama described the modern technological world as a "magnificent cathedral" and he made an appeal to "values embedded in the architecture of the system". We should look critically at whether the values of entrepreneurship, innovation and competitiveness embedded in the way digital business is done in America could be adjusted a little, to help restore the self-control and confidence that consumers keep telling us is evaporating online.
If the digital economy is really the economy then it's high time we moved beyond hoping that we can simply train users to be safe online. Is the real economy only for heros who can protect themselves in the jungle, writing their own code. As if they're carrying their own guns? Or do we as a community build structures and standards and insist on technologies that work for all?
For most people, the World Wide Web experience still a lot like watching cartoons on TV. The human-machine interface is almost the same. The images and actions are just as synthetic; crucially, nothing on a web browser is real. Almost anything goes -- just as the Roadrunner defies gravity in besting Coyote, there are no laws of physics that temper the way one bit of multimedia leads to the next. Yes, there is a modicum of user feedback in the way we direct some of the action when browsing and e-shopping, but it's quite illusory; for the most part all we're really doing is flicking channels across a billion pages.
It's the suspension of disbelief when browsing that lies at the heart of many of the safety problems we're now seeing. Inevitably we lose our bearings in the totally synthetic World Wide Web. We don't even realise it, we're taken in by a virtual reality, and we become captive to social engineering.
But I don't think it's possible to tackle online safety by merely countering users' credulity. Education is not the silver bullet, because the Internet is really so technologically complex and abstract that it lies beyond the comprehension of most lay people.
Using the Internet 'safely' today requires deep technical skills, comparable to the level of expertise needed to operate an automobile circa 1900. Back then you needed to be able to do all your own mechanics [roughly akin to the mysteries of maintaining anti-virus software], look after the engine [i.e. configure the operating system and firewall], navigate the chaotic emerging road network [there's yet no trusted directory for the Internet, nor any road rules], and even figure out how to fuel the contraption [consumer IT supply chains is about as primitive as the gasoline industry was 100 years ago]. The analogy with the early car industry becomes especially sharp for me when I hear utopian open source proponents argue that writing ones own software is the best way to be safe online.
The Internet is so critical (I'd have thought this was needless to say) that we need ways of working online that don't require us to all be DIY experts.
I wrote a first draft of this blog six years ago, and at that time I called for patience in building digital literacy and sophistication. "It took decades for safe car and road technologies to evolve, and the Internet is still really in its infancy" I said in 2009. But I'm less relaxed about his now, on the brink of the Internet of Things. It's great that the policy makers like the US FTC are calling on connected device makers to build in security and privacy, but I suspect the Internet of Things will require the same degree of activist oversight and regulation as does the auto industry, for the sake of public order and the economy. Do we have the appetite to temper breakneck innovation with safety rules?
This is an updated version of arguments made in Lockstep's submission to the 2009 Cyber Crime Inquiry by the Australian federal government.
In stark contrast to other fields, cyber safety policy is almost exclusively preoccupied with user education. It's really an obsession. Governments and industry groups churn out volumes of well-meaning and technically reasonable security advice, but for the average user, this material is overwhelming. There is a subtle implication that security is for experts, and that the Internet isn't safe unless you go to extremes. Moreover, even if consumers do their very best online, their personal details can still be taken over in massive criminal raids on databases that hardly anyone even know exist.
Too much onus is put on regular users protecting themselves online, and this blinds us to potential answers to cybercrime. In other walks of life, we accept a balanced approach to safety, and governments are less reluctant to impose standards than they are on the Internet. Road safety for instance rests evenly on enforceable road rules, car technology innovation, certified automotive products, mandatory quality standards, traffic management systems, and driver training and licensing. Education alone would be nearly worthless.
Around cybercrime we have a bizarre allergy to technology. We often hear that 'Preventing data breaches not a technology issue' which may be politically correct but it's faintly ridiculous. Nobody would ever say that preventing car crashes is 'not a technology issue'.
Credit card fraud and ID theft in general are in dire need of concerted technological responses. Consider that our Card Not Present (CNP) payments processing arrangements were developed many years ago for mail orders and telephone orders. It was perfectly natural to co-opt the same processes when the Internet arose, since it seemed simply to be just another communications medium. But the Internet turned out to be more than an extra channel: it connects everyone to everything, around the clock.
The Internet has given criminals x-ray vision into peoples' banking details, and perfect digital disguises with which to defraud online merchants. There are opportunities for crime now that are both quantitatively and qualitatively radically different from what went before. In particular, because identity data is available by the terabyte and digital systems cannot tell copies from originals, identity takeover is child's play.
You don't even need to have ever shopped online to run foul of CNP fraud. Most stolen credit card numbers are obtained en masse by criminals breaking into obscure backend databases. These attacks go on behind the scenes, out of sight of even the most careful online customers.
So the standard cyber security advice misses the point. Consumers are told earnestly to look out for the "HTTPS" padlock that purportedly marks a site as secure, to have a firewall, to keep their PCs "patched" and their anti-virus up to date, to only shop online at reputable merchants, and to avoid suspicious looking sites (as if cyber criminals aren't sufficiently organised to replicate legitimate sites in their entirety). But none of this advice touches on the problem of coordinated massive heists of identity data.
Merchants are on the hook for unwieldy and increasingly futile security overheads. When a business wishes to accept credit card payments, it's straightforward in the real world to install a piece of bank-approved terminal equipment. But to process credit cards online, shopkeepers have to sign up to onerous PCI-DSS requirements that in effect require even small business owners to become IT security specialists. But to what end? No audit regime will ever stop organised crime. To stem identity theft, we need to make stolen IDs less valuable.
All this points to urgent public policy matters for governments and banks. It is not enough to put the onus on individuals to guard against ad hoc attacks on their credit cards. Systemic changes and technological innovation are needed to render stolen personal data useless to thieves. It's not that the whole payments processing system is broken; rather, it is vulnerable at just one point where stolen digital identities can be abused.
Digital identities are the keys to our personal kingdoms. As such they really need to be treated as seriously as car keys, which have become very high tech indeed. Modern car keys cannot be duplicated at a suburban locksmith. It's possible you've come across office and filing cabinet keys that carry government security certifications. And we never use the same keys for our homes and offices; we wouldn't even consider it (which points to the basic weirdness in Single Sign On and identity federation).
In stark contrast to car keys, almost no attention is paid to the pedigree of digital identities. Technology neutrality has bred a bewildering array of ad hoc authentication methods, including SMS messages, one time password generators, password calculators, grid cards and picture passwords; at the same time we've done nothing at all to inhibit the re-use of stolen IDs.
It's high time government and industry got working together on a uniform and universal set of smart identity tools to properly protect consumers online.
Stay tuned for more of my thoughts on identity safety, inspired by recent news that health identifiers may be back on the table in the gigantic U.S. e-health system. The security and privacy issues are large but the cyber safety technology is at hand!
Constellation Research analysts are wrapping up a very busy 2014 with a series of "State of the State" reports. For my part I've looked at the state of privacy, which I feel is entering its adolescent stage.
Here's a summary.
1. Consumers have not given up privacy - they've been tricked out of it.
The impression is easily formed that people just don’t care about privacy anymore, but in fact people are increasingly frustrated with privacy invasions. They’re tired of social networks mining users’ personal lives; they are dismayed that video game developers can raid a phone’s contact lists with impunity; they are shocked by the deviousness of Target analyzing women’s shopping histories to detect pregnant customers; and they are revolted by the way magnates help themselves to operational data like Uber’s passenger movements for fun or allegedly for harassment – just because they can.
2. Private sector surveillance is overshadowed by government intrusion, but is arguably just as bad.
Edward Snowden’s revelations of a massive military-industrial surveillance effort were of course shocking, but they should not steal all the privacy limelight. In parallel with and well ahead of government spy programs, the big OSNs and search engine companies have been gathering breathtaking amounts of data, all in the interests of targeted advertising. These data stores have come to the attention of the FBI and CIA who must be delighted that someone else has done so much of their spying for them. These businesses boast that they know us better than we know ourselves. That’s chilling. We need to break through into a post-Snowden world.
3. The U.S. is the Canary Islands of privacy.
The United States remains the only major economy without broad-based information privacy laws. There are almost no restraints on what American businesses may do with personal information they collect from their customers, or synthesize from their operations. In the rest of the world, most organizations must restrict their collection of data, limit the repurposing of data, and disclose their data handling practices in full. Individuals may want to move closer to European-style privacy protection, while many corporations prefer the freedom they have in America to hang on to any data they like while they figure out how to make money out of it. Digital companies like to call this “innovation” and grandiose claims are made about its criticality for the American economy, but many consumers would prefer the sort of innovation that respects their privacy while delivering value-for-data.
4. Privacy is more about politics than technology.
Privacy can be seen as a power play between individual rights and the interests of governments and businesses. Most of us actually want businesses to know quite a lot about us, but we expect them to respect what they know and to be restrained in how they use it. Privacy is less about what organizations do with information than what they choose not to do with it. Hence, privacy cannot be a technology issue. It is not about keeping things secret but rather, keeping them close. Privacy is actually the protection we need when things are not secret.
5. Land grab for “public” data accelerates.
6. Data literacy will be key to digital safety.
Computer literacy is one thing, but data literacy is different and less tangible. We have strong privacy intuitions that have evolved over centuries but in cyberspace we lose our bearings. We don’t have the familiar social cues when we go online, so now we need to develop new ones. And we need to build up a common understanding of how data flows in the digital economy. Today we train kids in financial literacy to engender a first-hand sense of how commerce works; data literacy may become even more important as a life skill. It's more than being able to work an operating system, a device and umpteen apps. It means having meaningful mental models of what goes on in computers. Without understanding this, we can’t construct effective privacy policies or privacy labeling.
7. Privacy will get worse before it gets better.
Privacy is messy, even where data protection rules are well entrenched. Consider the controversial Right To Be Forgotten in Europe, which requires search engine operators to provide a mechanism for individuals to request removal of old, inaccurate and harmful reports from results. The new rule has been derived from existing privacy principles, which treat the results of search algorithms as a form of synthesis rather than a purely objective account of history, and therefore hold the search companies partly responsible for the offense their processes might produce. Yet, there are plenty of unintended consequences, and collisions with other jurisprudence. The sometimes urgent development of new protections for old civil rights is never plain sailing.
My report "Privacy Enters Adolescence" can be downloaded here. It expands on the points above, and sets out recommendations for improving awareness of how Personal Data flows in the digital economy, negotiating better deals in the data-for-value bargain, the conduct of Privacy Impact Assessments, and developing a "Privacy Bill of Rights".
An Engadget report today, "Hangouts eavesdrops on your chats to offer 'smart suggestions'" describes a new "spy/valet" feature being added to Google's popular video chat tool.
- "Google's Hangouts is gaining a handy, but slightly creepy new feature today. The popular chat app will now act as a digital spy-slash-valet by eavesdropping on your conversations to offer 'smart suggestions.' For instance, if a pal asks 'where are you?' it'll immediately prompt you to share your location, then open a map so you can pin it precisely."
It's sad that this sort of thing still gets meekly labeled as "creepy". The privacy implications are serious and pretty easy to see.
Google is evidently processing the text of Hangouts as they fly through their system, extracting linguistic cues, interpreting what's being said using Artificial Intelligence, extracting new meaning and insights, and offering suggestions.
We need some clarification about whether any covert tests of this technology have been undertaken during the R&D phase. A company obviously doesn't launch a new product like this without a lot of research, feasibility testing, prototyping and testing. Serious work on 'smart suggestions' would not start without first testing how it works in real life. So I wonder if any of this evaluation was done covertly on live data? Are Google researchers routinely eavesdropping on hangouts to develop the 'smart suggestions' technology?
Many people have said to me I'm jumping the gun, and that Google would probably test the new Hangouts feature on its own employees. Perhaps, but given that scanning gmails is situation normal for Google, and they have a "privacy" culture that joins up all their business units so that data may be re-purposed almsot without limit, I feel sure that running AI algorithms on text without telling people would be par for the course.
In development and in operation, we need to know what steps are taken to protect the privacy of hangout data. What personally identifiable data and metadata is retained for other purposes? Who inside Google is granted access to the data and especially the synthtised insights? How long does any secondary usage persist for? Are particularly sensitive matters (like health data, financial details, corporate intellectual property etc.) filtered out?
This is well beyond "creepy". Hangouts and similar video chat are certainly wonderdful technologies. We're using them routinely for teaching, education, video conferencing, collaboration and consultation. The tools may become entrenched in corporate meetings, telecommuting, healthcare and the professions. But if I am talking with my doctor, or discussing patents with my legal team, or having a clandestine chat with a lover, I clearly do not want any unsolicited contributions from the service provider. More fundamentally, I want assurance that no machine is ever tapping into these sorts of communications, running AI algorithms, and creating new insights. If I'm wrong about covert testing on live data, then Google could do what Apple did and publish an Open Letter clarifying their data usage practices and strategies.
Come to think of it, if Google is running natural language processing algorithms over the Hangouts stream, might they be augmenting their gmail scanning the same way? Their business model is to extract insights about users from any data they get their hands on. Until now it's been a crude business of picking out keywords and using them to profile users' interests and feed targeted advertising. But what if they could get deeper information about us through AI? Is there any sign from their historical business practices that Google would not do this? And what if they can extract sensitive information like mental health indications? Even with good intent and transarency, predicting healthcare from social media is highly problematic as shown by the "Samaritans Radar" experience.
Artificial Intelligence is one of the new frontiers. Hot on the heels of the successes of IBM Watson, we're seeing Natural Language Processing and analytics rapidly penetrate business and now consumer applications. Commentators are alternately telling us that AI will end humanity, and not to worry about it. For now, I call on people to simply think clearly through the implications, such as for privacy. If AI programs are clever enough to draw deep insights about us from what we say, then the "datapreneurs" in charge of those algorithms need to remember they are just as accountable for privacy as if they have asked us reveal all by filling out a questionnaire.
A letter to the editor of The Saturday Paper, published Nov 15, 2014.
In his otherwise fresh and sympathetic “Web of abuse” (November 8-14), Martin McKenzie-Murray unfortunately concludes by focusing on the ability of victims of digital hate to “[rationally] assess their threat level”. More’s the point, symbolic violence is still violent. The threat of sexual assault by men against women is inherently terrifying and damaging, whether it is carried out or not. Any attenuation of the threat of rape dehumanises all of us.
There’s a terrible double standard among cyber-libertarians. When good things happen online – such as the Arab Spring, WikiLeaks, social networking and free education – they call the internet a transformative force for good. Yet they can play down digital hate crimes as “not real”, and disown their all-powerful internet as just another communications medium.
Stephen Wilson, Five Dock, NSW.
In response to "The Solace of Oblivion", Jeffrey Toobin, The New Yorker, September 29th, 2014.
The "Right to be Forgotten" is an unfortunate misnomer for a balanced data control measure decided by the European Court of Justice. The new rule doesn't seek to erase the past but rather to restore some of its natural distance. Privacy is not about secrecy but moderation. Yet restraint is toxic to today's information magnates, and the response of some to even the slightest throttling of their control of data has been shrill. Google doth protest too much when it complains that having to adjust its already very elaborate search filters makes it an unwilling censor.
The result of a multi-billion dollar R&D program, Google's search engine is a great deal more than a latter-day microfiche. Its artificial intelligence tries to predict what users are really looking for, and as a result, its insights are all the more valuable to Google's real clients -- the advertisers. No search result is a passive reproduction of data from a "public domain". Google makes the public domain public. So if search reveals Personal Information that was hitherto impossible to find, then Google should at least participate in helping to temper the unintended privacy consequences.
October 1, 2014.
Few technologies are so fundamental and yet so derided at the same time as public key infrastructure. PKI is widely thought of as obsolete or generically intrusive yet it is ubiquitous in SIM cards, SSL, chip and PIN cards, and cable TV. Technically, public key infrastructure Is a generic term for a management system for keys and certificates; there have always been endless ways to build PKIs (note the plural) for different communities, technologies, industries and outcomes. And yet “PKI” has all too often come to mean just one way of doing identity management. In fact, PKI doesn’t necessarily have anything to do with identity at all.
This blog is an edited version of a feature I once wrote for SC Magazine. It is timely in the present day to re-visit the principles that make for good PKI implementations and contextualise them in one of the most contemporary instances of PKI: the FIDO Alliance protocols for secure attribute management. In my view, FIDO realises PKI ‘as nature intended’.
In their earliest conceptions in the early-to-mid 1990s, digital certificates were proposed to authenticate nondescript transactions between parties who had never met. Certificates were construed as the sole means for people to authenticate one another. Most traditional PKI was formulated with no other context; the digital certificate was envisaged to be your all-purpose digital identity.
Orthodox PKI has come in for spirited criticism. From the early noughties, many commentators pointed to a stark paradox: online transaction volumes and values were increasing rapidly, in almost all cases without the help of overt PKI. Once thought to be essential, with its promise of "non repdudiation", PKI seemed anything but, even for significant financial transactions.
There were many practical problems in “big” centralised PKI models. The traditional proof of identity for general purpose certificates was intrusive; the legal agreements were complex and novel; and private key management was difficult for lay people. So the one-size-fits-all electronic passport failed to take off. But PKI's critics sometimes throw the baby out with the bathwater.
In the absence of any specific context for its application, “big” PKI emphasized proof of personal identity. Early certificate registration schemes co-opted identification benchmarks like that of the passport. Yet hardly any regular business transactions require parties to personally identify one another to passport standards.
”Electronic business cards”
Instead in business we deal with others routinely on the basis of their affiliations, agency relationships, professional credentials and so on. The requirement for orthodox PKI users to submit to strenuous personal identity checks over and above their established business credentials was a major obstacle in the adoption of digital certificates.
It turns out that the 'killer applications' for PKI overwhelmingly involve transactions with narrow contexts, predicated on specific credentials. The parties might not know each other personally, but invariably they recognize and anticipate each other's qualifications, as befitting their business relationship.
Successful PKI came to be characterized by closed communities of interest, prior out-of-band registration of members, and in many cases, special-purpose application software featuring additional layers of context, security and access controls.
So digital certificates are much more useful when implemented as application-specific 'electronic business cards,' than as one-size-fits-all electronic passports. And, by taking account of the special conditions that apply to different e-business processes, we have the opportunity to greatly simplify the registration processes, user experience and liability arrangements that go with PKI.
The real benefits of digital signatures
There is a range of potential advantages in using PKI, including its cryptographic strength and resistance to identity theft (when implemented with private keys in hardware). Many of its benefits are shared with other technologies, but at least two are unique to PKI.
First, digital signatures provide robust evidence of the origin and integrity of electronic transactions, persistent over time and over 'distance’ (that is, the separation of sender and receiver). This greatly simplifies audit logging, evidence collection and dispute resolution, and cuts the future cost of investigation and fraud. If a digitally signed document is archived and checked at a later date, the quality of the signature remains undiminished over many years, even if the public key certificate has long since expired. And if a digitally signed message is passed from one relying party to another and on to many more, passing through all manner of intermediate systems, everyone still receives an identical, verifiable signature code authenticating the original message.
Electronic evidence of the origin and integrity of a message can, of course, be provided by means other than a digital signature. For example, the authenticity of typical e-business transactions can usually be demonstrated after the fact via audit logs, which indicate how a given message was created and how it moved from one machine to another. However, the quality of audit logs is highly variable and it is costly to produce legally robust evidence from them. Audit logs are not always properly archived from every machine, they do not always directly evince data integrity, and they are not always readily available months or years after the event. They are rarely secure in themselves, and they usually need specialists to interpret and verify them. Digital signatures on the other hand make it vastly simpler to rewind transactions when required.
Secondly, digital signatures and certificates are machine readable, allowing the credentials or affiliations of the sender to be bound to the message and verified automatically on receipt, enabling totally paperless transacting. This is an important but often overlooked benefit of digital signatures. When processing a digital certificate chain, relying party software can automatically tell that:
- the message has not been altered since it was originally created
- the sender was authorized to launch the transaction, by virtue of credentials or other properties endorsed by a recognized Certificate Authority
- the sender's credentials were valid at the time they sent the message; and
- the authority which signed the certificate was fit to do so.
One reason we can forget about the importance of machine readability is that we have probably come to expect person-to-person email to be the archetypal PKI application, thanks to email being the classic example to illustrate PKI in action. There is an implicit suggestion in most PKI marketing and training that, in regular use, we should manually click on a digital signature icon, examine the certificate, check which CA issued it, read the policy qualifier, and so on. Yet the overwhelming experience of PKI in practice is that it suits special purpose and highly automated applications, where the usual receiver of signed transactions is in fact a computer.
Characterising good applications
Reviewing the basic benefits of digital signatures allows us to characterize the types of e-business applications that merit investment in PKI.
Applications for which digital signatures are a good fit tend to have reasonably high transaction volumes, fully automatic or straight-through processing, and multiple recipients or multiple intermediaries between sender and receiver. In addition, there may be significant risk of dispute or legal ramifications, necessitating high quality evidence to be retained over long periods of time. These include:
- Tax returns
- Customs reporting
- E-health care
- Financial trading
- Electronic conveyancing
- Superannuation administration
- Patent applications.
This view of the technology helps to explain why many first-generation applications of PKI were problematic. Retail internet banking is a well-known example of e-business which flourished without the need for digital certificates. A few banks did try to implement certificates, but generally found them difficult to use. Most later reverted to more conventional access control and backend security mechanisms.Yet with hindsight, retail funds transfer transactions did not have an urgent need for PKI, since they could make use of existing backend payment systems. Funds transfer is characterized by tightly closed arrangements, a single relying party, built-in limits on the size of each transaction, and near real-time settlement. A threat and risk assessment would show that access to internet banking can rest on simple password authentication, in exactly the same way as antecedent phone banking schemes.
Trading complexity for specificity
As discussed, orthodox PKI was formulated with the tacit assumption that there is no specific context for the transaction, so the digital certificate is the sole means for authenticating the sender. Consequently, the traditional schemes emphasized high standards of personal identity, exhaustive contracts and unusual legal devices like Relying Party Agreements. They also often resorted to arbitrary 'reliance limits,' which have little meaning for most of the applications listed on the previous page. Notoriously, traditional PKI requires users to read and understand certification practice statements (CPS).
All that overhead stemmed from not knowing what the general-purpose digital certificate was going to be used for. On the other hand, if particular digital certificates are constrained to defined applications, then the complexity surrounding their specific usage can be radically reduced.
The role of PKI in all contemporary 'killer applications' is fundamentally to help automate the online processing of electronic transactions between parties with well-defined credentials. This is in stark contrast to the way PKI has historically been portrayed, where strangers Alice and Bob use their digital certificates to authenticate context-free general messages, often presumed to be sent by email. In reality, serious business messages are never sent stranger-to-stranger with no context or cues as to the parties' legitimacy.
Using generic email is like sending a fax on plain paper. Instead, business messaging is usually highly structured. Parties have an expectation that only certain types of transactions are going to occur between them and they equip themselves accordingly (for instance, a health insurance office is not set up to handle tax returns). The sender is authorized to act in defined types of transactions by virtue of professional credentials, a relevant license, an affiliation with some authority, endorsement by their employer, and so on. And the receiver recognizes the source of those credentials. The sender and receiver typically use prescribed forms and/or special purpose application software with associated user agreements and license conditions, adding context and additional layers of security around the transaction.
PKI got smart
When PKI is used to help automate the online processing of transactions between parties in the context of an existing business relationship, we should expect the legal arrangements between the parties to still apply. For business applications where digital certificates are used to identify users in specific contexts, the question of legal liability should be vastly simpler than it is in the general purpose PKI scenario where the issuer does not know what the certificates might be used for.
The new vision for PKI means the technology and processes should be no more of a burden on the user than a bank card. Rather than imagine that all public key certificates are like general purpose electronic passports, we can deploy multiple, special purpose certificates, and treat them more like electronic business cards. A public key certificate issued on behalf of a community of business users and constrained to that community can thereby stand for any type of professional credential or affiliation.
We can now automate and embed the complex cryptography deeply into smart devices -- smartcards, smart phones, USB keys and so on -- so that all terms and conditions for use are application focused. As far as users are concerned, a smartcard can be deployed in exactly the same way as any magnetic stripe card, without any need to refer to - or be limited by - the complex technology contained within (see also Simpler PKI is on the cards). Any application-specific smartcard can be issued under rules and controls that are fit for their purpose, as determined by the community of users or an appropriate recognized authority. There is no need for any user to read a CPS. Communities can determine their own evidence-of-identity requirements for issuing cards, instead of externally imposed personal identity checks. Deregulating membership rules dramatically cuts the overheads traditionally associated with certificate registration.
Finally, if we constrain the use of certificates to particular applications then we can factor the intended usage into PKI accreditation processes. Accreditation could then allow for particular PKI scheme rules to govern liability. By 'black-boxing' each community's rules and arrangements, and empowering the community to implement processes that are fit for its purpose, the legal aspects of accreditation can be simplified, reducing one of the more significant cost components of the whole PKI exercise (having said that, it never ceases to amaze how many contemporary healthcare PKIs still cling onto face-to-face passport grade ID proofing as if that's the only way to do digital certificates).
The preceding piece is a lightly edited version of the article ”Rethinking PKI” that first appeared in Secure Computing Magazine in 2003. Now, over a decade later, we’re seeing the same principles realised by the FIDO Alliance.
The FIDO protocols U2F and UAF enable specific attributes of a user and their smart devices to be transmitted to a server. Inherent to the FIDO methods are digital certificates that confer attributes and not identity, relatively large numbers of private keys stored locally in the users’ devices (and without the users needing to be aware of them as such) and digital signatures automatically applied to protocol messages to bind the relevant attributes to the authentication exchanges.
Surely, this is how PKI should have been deployed all along.