The Australian government is to revamp the troubled Personally Controlled Electronic Health Record (PCEHR). In line with the Royle Review from Dec 2013, it is reported that patient participation is to change from the current Opt-In model to Opt-Out; see "Govt to make e-health records opt-out" by Paris Cowan, IT News.
That is to say, patient data from hospitals, general practice, pathology and pharmacy will be added by default to a central longitudinal health record, unless patients take steps (yet to be specified) to disable sharing.
The main reason for switching the consent model is simply to increase the take-up rate. But it's a much bigger change than many seem to realise.
The government is asking the community to trust it to hold essentially all medical records. Are the PCEHR's security and privacy safeguards up to scratch to take on this grave responsibility? I argue the answer is no, on two grounds.
Firstly there is the practical matter of PCEHR's security performance to date. It's not good, based on publicly available information. On multiple occasions, prescription details have been uploaded from community pharmacy to the wrong patient's records. There have been a few excuses made for this error, with blame sheeted home to the pharmacy. But from a system's perspective -- and health care is all about the systems -- you cannot pass the buck like that. Pharmacists are using a PCEHR system that was purportedly designed for them. And it was subject to system-wide threat & risk assessments that informed the architecture and design of not just the electronic records system but also the patient and healthcare provider identification modules. How can it be that the PCEHR allows such basic errors to occur?
Secondly and really fundamentally, you simply cannot invert the consent model as if it's a switch in the software. The privacy approach is deep in the DNA of the system. Not only must PCEHR security be demonstrably better than experience suggests, but it must be properly built in, not retrofitted.
Let me explain how the consent approach crops up deep in the architecture of something like PCEHR. During analysis and design, threat & risk assessments (TRAs) and privacy impact assessments (PIAs) are undertaken, to identify things that can go wrong, and to specify security and privacy controls. These controls generally comprise a mix of technology, policy and process mechanisms. For example, if there is a risk of patient data being sent to the wrong person or system, that risk can be mitigated a number of ways, including authentication, user interface design, encryption, contracts (that obligate receivers to act responsibly), and provider and patient information. The latter are important because, as we all should know, there is no such thing as perfect security. Mistakes are bound to happen.
One of the most fundamental privacy controls is participation. Individuals usually have the ultimate option of staying away from an information system if they (or their advocates) are not satisfied with the security and privacy arrangements. Now, these are complex matters to evaluate, and it's always best to assume that patients do not in fact have a complete understanding of the intricacies, the pros and cons, and the net risks. People need time and resources to come to grips with e-health records, so a default opt-in affords them that breathing space. And it errs on the side of caution, by requiring a conscious decision to participate. In stark contrast, a default opt-out policy embodies a position that the scheme operator believes it knows best, and is prepared to make the decision to participate on behalf of all individuals.
Such a position strikes many as beyond the pale, just on principle. But if opt-out is the adopted policy position, then clearly it has to be based on a risk assessment where the pros indisputably out-weigh the cons. And this is where making a late switch to opt-out is unconscionable.
You see, in an opt-in system, during analysis and design, whenever a risk is identified that cannot be managed down to negligible levels by way of technology and process, the ultimate safety net is that people don't need to use the PCEHR. It is a formal risk management ploy (a part of the risk manager's toolkit) to sometimes fall back on the opt-in policy. In an opt-in system, patients sign an agreement in which they accept some risk. And the whole security design is predicated on that.
Look at the most recent PIA done on the PCEHR in 2011; section 9.1.6 "Proposed solutions - legislation" makes it clear that opt-in participation is core to the existing architecture. The PIA makes a "critical legislative recommendation" including:
- a number of measures to confirm and support the 'opt in' nature of the PCEHR for consumers (Recommendations 4.1 to 4.3) [and] preventing any extension of the scope of the system, or any change to the 'opt in' nature of the PCEHR.
The PIA at section 2.2 also stresses that a "key design feature of the PCEHR System ... is opt in – if a consumer or healthcare provider wants to participate, they need to register with the system." And that the PCEHR is "not compulsory – both consumers and healthcare providers choose whether or not to participate".
A PDF copy of the PIA report, which was publicly available at the Dept of Health website for a few years after 2011, is archived here.
The fact is that if the government changes the PCEHR from opt-in to opt-out, it will invalidate the security and privacy assessments done to date. The PIAs and TRAs will have to be repeated, and the project must be prepared for major redesign.
The Royle Review report (PDF) did in fact recommend "a technical assessment and change management plan for an opt-out model ..." (Recommendation 14) but I am not aware that such a review has taken place.
To look at the seriousness of this another way, think about "Privacy by Design", the philosophy that's being steadily adopted across government. In 2014 NEHTA wrote in a submission (PDF) to the Australian Privacy Commissioner:
- The principle that entities should employ “privacy by design” by building privacy into their processes, systems, products and initiatives at the design stage is strongly supported by NEHTA. The early consideration of privacy in any endeavour ensures that the end product is not only compliant but meets the expectations of stakeholders.
One of the tenets of Privacy by Design is that you cannot bolt on privacy after a design is done. Privacy must be designed into the fabric of any system from the outset. All the way along, PCEHR has assumed opt-in, and the last PIA enshrined that position.
If the government was to ignore its own Privacy by Design credo, and not revisit the PCEHR architecture, it would be an amazing breach of the public's trust in the healthcare system.
Ray Wang tells us now that writing a book and launching a company are incredibly fulfilling things to do - but ideally, not at the same time. He thought it would take a year to write "Disrupting Digital Business", but since it overlapped with building Constellation Research, it took three! But at the same time, his book is all the richer for that experience.
Ray is on a world-wide book tour (tweeting under the hash tag #cxotour). I was thrilled to participate in the Melbourne leg last week. We convened a dinner at Melbourne restaurant The Deck and were joined by a good cross section of Australian private and public sector businesses. There were current and recent executives from Energy Australia, Rio Tinto, the Victorian Government and Australia Post among others, plus the founders of several exciting local start-ups. And we were lucky to have special guests Brian Katz and Ben Robbins - two renowned mobility gurus.
The format for all the launch events has one or two topical short speeches from Constellation analysts and Associates, and a fireside chat by Ray. In Melbourne, we were joined by two of Australia's deep digital economy experts, Gavin Heaton and Joanne Jacobs. Gavin got us going on the night, surveying the importance of innovation, and the double edged opportunities and threats of digital disruption.
Then Ray spoke off-the-cuff about his book, summarising years of technology research and analysis, and the a great many cases of business disruption, old and new. Ray has an encyclopedic grasp of tech-driven successes and failures going back decades, yet his presentations are always up-to-the-minute and full of practical can-do calls to action. He's hugely engaging, and having him on a small stage for a change lets him have a real conversation with the audience.
Speaking with no notes and PowerPoint-free, Ray ranged across all sorts of disruptions in all sorts of sectors, including:
- Sony's double cassette Walkman (which Ray argues playfully was their "last innovation")
- Coca Cola going digital, and the speculative "ten cent sip"
- the real lesson of the iPhone: geeks spend time arguing about whether Apple's technology is original or appropriated, when the point is their phone disrupted 20 or more other business models
- the contrasting Boeing 787 Dreamliner and Airbus A380 mega jumbo - radically different ways to maximise the one thing that matters to airlines: dollars per passenger-miles, and
- Uber, which observers don't always fully comprehend as a rich mix of mobility, cloud and Big Data.
And I closed the scheduled part of the evening with a provocation on privacy. I asked the group to think about what it means to call any online business practice "creepy". Have community norms and standards really changed in the move online? What's worse: government surveillance for political ends, or private sector surveillance for profit? If we pay for free online services with our personal information, do regular consumers understand the bargain? And if cynics have been asking "Is Privacy Dead?" for over 100 years, doesn't it mean the question is purely rhetorical? Who amongst us truly wants privacy to be over?!
The discussion quickly attained a life of its own - muscular, but civilized. And it provided ample proof that whatever you think about privacy, it is complicated and surprising, and definitely disruptive! (For people who want to dig further into the paradoxes of modern digital privacy, Ray and I recently recorded a nice long chat about it).
Here are some of the Digital Disruption tour dates coming up:
Acknowledgement: Daniel Barth-Jones kindly engaged with me after this blog was initially published, and pointed out several significant factual errors, for which I am grateful.
In 2014, the New York Taxi & Limousine Company (TLC) released a large "anonymised" dataset containing 173 million taxi rides taken in 2013. Soon after, software developer Vijay Pandurangan managed to undo the hashed taxi registration numbers. Subsequently, privacy researcher Anthony Tockar went on to combine public photos of celebrities getting in or out of cabs, to recreate their trips. See Anna Johnston's analysis here.
This re-identification demonstration has been used by some to bolster a general claim that anonymity online is increasingly impossible.
On the other hand, medical research advocates like Columbia University epidemiologist Daniel Barth-Jones argue that the practice of de-identification can be robust and should not be dismissed as impractical on the basis of demonstrations such as this. The identifiability of celebrities in these sorts of datasets is a statistical anomaly reasons Barth-Jones and should not be used to frighten regular people out of participating in medical research on anonymised data. He wrote in a blog that:
- "However, it would hopefully be clear that examining a miniscule proportion of cases from a population of 173 million rides couldn’t possibly form any meaningful basis of evidence for broad assertions about the risks that taxi-riders might face from such a data release (at least with the taxi medallion/license data removed as will now be the practice for FOIL request data)."
As a health researcher, Barth-Jones is understandably worried that re-identification of small proportions of special cases is being used to exaggerate the risks to ordinary people. He says that the HIPAA de-identification protocols if properly applied leave no significant risk of re-id. But even if that's the case, HIPAA processes are not applied to data across the board. The TLC data was described as "de-identified" and the fact that any people at all (even stand-out celebrities) could be re-identified from data does create a broad basis for concern - "de-identified" is not what it seems. Barth-Jones stresses that in the TLC case, the de-identification was fatally flawed [technically: it's no use hashing data like registration numbers with limited value ranges because the hashed values can be reversed by brute force] but my point is this: who among us who can tell the difference between poorly de-identified and "properly" de-identified?
And how long can "properly de-identified" last? What does it mean to say casually that only a "minuscule proportion" of data can be re-identified? In this case, the re-identification of celebrities was helped by the fact lots of photos of them are readily available on social media, yet there are so many photos in the public domain now, regular people are going to get easier to be identified.
But my purpose here is not to play what-if games, and I know Daniel advocates statistically rigorous measures of identifiability. We agree on that -- in fact, over the years, we have agreed on most things. The point I am trying to make in this blog post is that, just as nobody should exaggerate the risk of re-identification, nor should anyone play it down. Claims of de-identification are made almost daily for really crucial datasets, like compulsorily retained metadata, public health data, biometric templates, social media activity used for advertising, and web searches. Some of these claims are made with statistical rigor, using formal standards like the HIPAA protocols; but other times the claim is casual, made with no qualification, with the aim of comforting end users.
"De-identified" is a helluva promise to make, with far-reaching ramifications. Daniel says de-identification researchers use the term with caution, knowing there are technical qualifications around the finite probability of individuals remaining identifiable. But my position is that the fine print doesn't translate to the general public who only hear that a database is "anonymous". So I am afraid the term "de-identified" is meaningless outside academia, and in casual use is misleading.
Barth-Jones objects to the conclusion that "it's virtually impossible to anonymise large data sets" but in an absolute sense, that claim is surely true. If any proportion of people in a dataset may be identified, then that data set is plainly not "anonymous". Moreover, as statistics and mathematical techniques (like facial recognition) improve, and as more ancillary datasets (like social media photos) become accessible, the proportion of individuals who may be re-identified will keep going up.[Readers who wish to pursue these matters further should look at the recent Harvard Law School online symposium on "Re-identification Demonstrations", hosted by Michelle Meyer, in which Daniel Barth-Jones and I participated, among many others.]
Both sides of this vexed debate need more nuance. Privacy advocates have no wish to quell medical research per se, nor do they call for absolute privacy guarantees, but we do seek full disclosure of the risks, so that the cost-benefit equation is understood by all. One of the obvious lessons in all this is that "anonymous" or "de-identified" on their own are poor descriptions. We need tools that meaningfully describe the probability of re-identification. If statisticians and medical researchers take "de-identified" to mean "there is an acceptably small probability, namely X percent, of identification" then let's have that fine print. Absent the detail, lay people can be forgiven for thinking re-identification isn't going to happen. Period.
And we need policy and regulatory mechanisms to curb inappropriate re-identification. Anonymity is a brittle, essentially temporary, and inadequate privacy tool.
I argue that the act of re-identification ought to be treated as an act of Algorithmic Collection of PII, and regulated as just another type of collection, albeit an indirect one. If a statistical process results in a person's name being added to a hitherto anonymous record in a database, it is as if the data custodian went to a third party and asked them "do you know the name of the person this record is about?". The fact that the data custodian was clever enough to avoid having to ask anyone about the identity of people in the re-identified dataset does not alter the privacy responsibilities arising. If the effect of an action is to convert anonymous data into personally identifiable information (PII), then that action collects PII. And in most places around the world, any collection of PII automatically falls under privacy regulations.
It looks like we will never guarantee anonymity, but the good news is that for privacy, we don't actually need to. Privacy is the protection you need when you affairs are not anonymous, for privacy is a regulated state where organisations that have knowledge about you are restrained in what they do with it. Equally, the ability to de-anonymise should be restricted in accordance with orthodox privacy regulations. If a party chooses to re-identify people in an ostensibly de-identified dataset, without a good reason and without consent, then that party may be in breach of data privacy laws, just as they would be if they collected the same PII by conventional means like questionnaires or surveillance.
Surely we can all agree that re-identification demonstrations serve to shine a light on the comforting claims made by governments for instance that certain citizen datasets can be anonymised. In Australia, the government is now implementing telecommunications metadata retention laws, in the interests of national security; the metadata we are told is de-identified and "secure". In the UK, the National Health Service plans to make de-identified patient data available to researchers. Whatever the merits of data mining in diverse fields like law enforcement and medical research, my point is that any government's claims of anonymisation must be treated critically (if not skeptically), and subjected to strenuous and ongoing privacy impact assessment.
Privacy, like security, can never be perfect. Privacy advocates must avoid giving the impression that they seek unrealistic guarantees of anonymity. There must be more to privacy than identity obscuration (to use a more technically correct term than "de-identification"). Medical research should proceed on the basis of reasonable risks being taken in return for beneficial outcomes, with strong sanctions against abuses including unwarranted re-identification. And then there wouldn't need to be a moral panic over re-identification if and when it does occur, because anonymity, while highly desirable, is not essential for privacy in any case.
The State Of Identity Management in 2015
Constellation Research recently launched the "State of Enterprise Technology" series of research reports. These assess the current enterprise innovations which Constellation considers most crucial to digital transformation, and provide snapshots of the future usage and evolution of these technologies.
My second contribution to the state-of-the-state series is "Identity Management Moves from Who to What". Here's an excerpt from the report:
In spite of all the fuss, personal identity is not usually important in routine business. Most transactions are authorized according to someone’s credentials, membership, role or other properties, rather than their personal details. Organizations actually deal with many people in a largely impersonal way. People don’t often care who someone really is before conducting business with them. So in digital Identity Management (IdM), one should care less about who a party is than what they are, with respect to attributes that matter in the context we’re in. This shift in focus is coming to dominate the identity landscape, for it simplifies a traditionally multi-disciplined problem set. Historically, the identity management community has made too much of identity!
Six Digital Identity Trends for 2015
1. Mobile becomes the center of gravity for identity. The mobile device brings convergence for a decade of progress in IdM. For two-factor authentication, the cell phone is its own second factor, protected against unauthorized use by PIN or biometric. Hardly anyone ever goes anywhere without their mobile - service providers can increasingly count on that without disenfranchising many customers. Best of all, the mobile device itself joins authentication to the app, intimately and seamlessly, in the transaction context of the moment. And today’s phones have powerful embedded cryptographic processors and key stores for accurate mutual authentication, and mobile digital wallets, as Apple’s Tim Cook highlighted at the recent White House Cyber Security Summit.
2. Hardware is the key – and holds the keys – to identity. Despite the lure of the cloud, hardware has re-emerged as pivotal in IdM. All really serious security and authentication takes place in secure dedicated hardware, such as SIM cards, ATMs, EMV cards, and the new Trusted Execution Environment mobile devices. Today’s leading authentication initiatives, like the FIDO Alliance, are intimately connected to standard cryptographic modules now embedded in most mobile devices. Hardware-based identity management has arrived just in the nick of time, on the eve of the Internet of Things.
3. The “Attributes Push” will shift how we think about identity. In the words of Andrew Nash, CEO of Confyrm Inc. (and previously the identity leader at PayPal and Google), “Attributes are at least as interesting as identities, if not more so.” Attributes are to identity as genes are to organisms – they are really what matters about you when you’re trying to access a service. By fractionating identity into attributes and focusing on what we really need to reveal about users, we can enhance privacy while automating more and more of our everyday transactions.
The Attributes Push may recast social logon. Until now, Facebook and Google have been widely tipped to become “Identity Providers”, but even these giants have found federated identity easier said than done. A dark horse in the identity stakes – LinkedIn – may take the lead with its superior holdings in verified business attributes.
4. The identity agenda is narrowing. For 20 years, brands and organizations have obsessed about who someone is online. And even before we’ve solved the basics, we over-reached. We've seen entrepreneurs trying to monetize identity, and identity engineers trying to convince conservative institutions like banks that “Identity Provider” is a compelling new role in the digital ecosystem. Now at last, the IdM industry agenda is narrowing toward more achievable and more important goals - precise authentication instead of general identification.
5. A digital identity stack is emerging. The FIDO Alliance and others face a challenge in shifting and improving the words people use in this space. Words, of course, matter, as do visualizations. IdM has suffered for too long under loose and misleading metaphors. One of the most powerful abstractions in IT was the OSI networking stack. A comparable sort of stack may be emerging in IdM.
6. Continuity will shape the identity experience. Continuity will make or break the user experience as the lines blur between real world and virtual, and between the Internet of Computers and the Internet of Things. But at the same time, we need to preserve clear boundaries between our digital personae, or else privacy catastrophes await. “Continuous” (also referred to as “Ambient”) Authentication is a hot new research area, striving to provide more useful and flexible signals about the instantaneous state of a user at any time. There is an explosion in devices now that can be tapped for Continuous Authentication signals, and by the same token, rich new apps in health, lifestyle and social domains, running on those very devices, that need seamless identity management.
A snapshot at my report "Identity Moves from Who to What" is available for download at Constellation Research. It expands on the points above, and sets out recommendations for enterprises to adopt the latest identity management thinking.
I had a letter to the editor published in Nature on big data and privacy.
Nature 519, 414 (26 March 2015) doi:10.1038/519414a
Published online 25 March 2015
Letter as published
Privacy issues around data protection often inspire over-engineered responses from scientists and technologists. Yet constraints on the use of personal data mean that privacy is less about what is done with information than what is not done with it. Technology such as new algorithms may therefore be unnecessary (see S. Aftergood, Nature 517, 435–436; 2015).
Technology-neutral data-protection laws afford rights to individuals with respect to all data about them, regardless of the data source. More than 100 nations now have such data-privacy laws, typically requiring organizations to collect personal data only for an express purpose and not to re-use those data for unrelated purposes.
If businesses come to know your habits, your purchase intentions and even your state of health through big data, then they have the same privacy responsibilities as if they had gathered that information directly by questionnaire. This is what the public expects of big-data algorithms that are intended to supersede cumbersome and incomplete survey methods. Algorithmic wizardry is not a way to evade conventional privacy laws.
Constellation Research, Sydney, Australia.
Yawn. Alexander Nazaryan in Newsweek (March 22) has penned yet another tirade against privacy.
His column is all strawman. No one has ever said privacy is more important than other rights and interests. The infamous Right to be Forgotten is a case in point -- the recent European ruling is expressly about balancing competing interests, around privacy and public interest. All privacy rules and regulations, our intuitions and habits, all concede there may be over-riding factors in the mix.
So where on earth does the author and his editors get the following shrill taglines from?
- "You’re 100% Wrong About Privacy"
- "Our expectation of total online privacy is unrealistic and dangerous"
- "Total privacy is a dangerous delusion".
It is so tiresome that we advocates have to keep correcting grotesque misrepresentations of our credo. The right to be let alone was recognised in American law 125 years ago, and was written into the UN International Covenant on Civil and Political Rights in 1966. Every generation witnesses again the rhetorical question "Is Privacy Dead?" (see Newsweek, 27 July 1970). The answer, after fifty years, is still "no". The very clear trend worldwide is towards more privacy regulation, not less.
Funnily enough, Nazaryan makes a case for privacy himself, when he reminds us by-the-by that "the feds do covertly collect data about us, often with the complicity of high-tech and telecom corporations" and that "any user of Google has to expect that his/her information will be used for commercial gain". Most reasonable people look to privacy to address such ugly imbalances!
Why are critics of privacy so coldly aggressive? If Nazaryan feels no harm comes from others seeing him searching porn, then we might all admire his confidence. But is it any of his business what the rest of us do in private? Or the government's business, or Google's?
Privacy is just a fundamental matter of restraint. People should only have their personal information exposed on a need-to-know basis. Individuals don't have to justify their desire for privacy! The onus must be on the watchers to justify their interests.
Why do Alexander Nazaryan and people of his ilk so despise privacy? I wonder what political or commercial agendas they have to hide?
Posted in Privacy
In May 2014, the European Court of Justice (ECJ) ruled that under European law, people have the right to have certain information about them delisted from search engine results. The ECJ ruling was called the "Right to be Forgotten", despite it having little to do with forgetting (c'est la vie). Shortened as RTBF, it is also referred to more clinically as the "Right to be Delisted" (or simply as "Google Spain" because that was one of the parties in the court action). Within just a few months, the RTBF has triggered conferences, public debates, and a TEDx talk.
Google itself did two things very quickly in response to the RTBF ruling. First, it mobilised a major team to process delisting requests. This is no mean feat -- over 200,000 requests have been received to date; see Google's transparency report. However it's not surprising they got going so quickly as they already have well-practiced processes for take-down notices for copyright and unlawful material.
Secondly, the company convened an Advisory Council of independent experts to formulate strategies for balancing the competing rights and interests bound up in RTBF. The Advisory Council delivered its report in January; it's available online here.
I declare I'm a strong supporter of RTBF. I've written about it here and here, and participated in an IEEE online seminar. I was impressed by the intellectual and eclectic make-up of the Council, which includes a past European Justice Minister, law professors, and a philosopher. And I do appreciate that the issues are highly complex. So I had high expectations of the Council's report.
Yet I found it quite barren.
Recap - the basics of RTBF
EU Justice Commissioner Martine Reicherts in a speech last August gave a clear explanation of the scope of the ECJ ruling, and acknowledged its nuances. Her speech should be required reading. Reicherts summed up the situation thus:
- What did the Court actually say on the right to be forgotten? It said that individuals have the right to ask companies operating search engines to remove links with personal information about them – under certain conditions - when information is inaccurate, inadequate, irrelevant, outdated or excessive for the purposes of data processing. The Court explicitly ruled that the right to be forgotten is not absolute, but that it will always need to be balanced against other fundamental rights, such as the freedom of expression and the freedom of the media – which, by the way, are not absolute rights either.
Everyone concerned acknowledges there are tensions in the RTBF ruling. The Google Advisory Council Report mentions these tensions (in Section 3) but sadly spends no time critically exploring them. In truth, all privacy involves conflicting requirements, and to that extent, many features of RTBF have been seen before. At p5, the Report mentions that "the [RTBF] Ruling invokes a data subject’s right to object to, and require cessation of, the processing of data about himself or herself" (emphasis added); the reader may conclude, as I have, that the computing of search results by a search engine is just another form of data processing.
One of the most important RTBF talking points is whether it's fair that Google is made to adjudicate delisting requests. I have some sympathies for Google here, and yet this is not an entirely novel situation in privacy. A standard feature of international principles-based privacy regimes is the right of individuals to have erroneous personal data corrected (this is, for example, OECD Privacy Principle No. 7 - Individual Participation, and Australian Privacy Principle No. 13 - Correction of Personal Information). And at the top of p5, the Council Report cites the right to have errors rectified. So it is standard practice that a data custodian must have means for processing access and correction requests. Privacy regimes expect there to be dispute resolution mechanisms too, operated by the company concerned. None of this is new. What seems to be new to some stakeholders is the idea that the results of a search engine is just another type of data processing.
A little rushed
The Council explains in the Introduction to the Report that it had to work "on an accelerated timeline, given the urgency with which Google had to begin complying with the Ruling once handed down". I am afraid that the Report shows signs of being a little rushed.
- There are several spelling errors.
- The contributions from non English speakers could have done with some editing.
- Less trivially, many of the footnotes need editing; it's not always clear how a person's footnoted quote supports the text.
- More importantly, the Advisory Council surely operated with Terms of Reference, yet there is no clear explanation of what those were. At the end of the introduction, we're told the group was "convened to advise on criteria that Google should use in striking a balance, such as what role the data subject plays in public life, or whether the information is outdated or no longer relevant. We also considered the best process and inputs to Google’s decision making, including input from the original publishers of information at issue, as potentially important aspects of the balancing exercise." I'm surprised there is not a more complete and definitive description of the mission.
- It's not actually clear what sort of search we're all talking about. Not until p7 of the Report does the qualified phrase "name-based search" first appear. Are there other types of search for which the RTBF does not apply?
- Above all, it's not clear that the Council has reached a proper conclusion. The Report makes a number of suggestions in passing, and there is a collection of "ideas" at the back for improving the adjudication process, but there is no cogent set of recommendations. That may be because the Council didn't actually reach consensus.
And that's one of the most surprising things about the whole exercise. Of the eight independent Council members, five of them wrote "dissenting opinions". The work of an expert advisory committee is not normally framed as a court-like determination, from which members might dissent. And even if it was, to have the majority of members "dissent" casts doubt on the completeness or even the constitution of the process. Is there anything definite to be dissented from?
Jimmy Wales, the Wikipedia founder and chair, was especially strident in his individual views at the back of the Report. He referred to "publishers whose works are being suppressed" (p27 of the Report), and railed against the report itself, calling its recommendation "deeply flawed due to the law itself being deeply flawed". Can he mean the entire Charter of Fundamental Rights of the EU and European Convention on Human Rights? Perhaps Wales is the sort of person that denies there are any nuances in privacy, because "suppressed" is an exaggeration if we accept that RTBF doesn't cause anything to be forgotten. In my view, it poisons the entire effort when unqualified insults are allowed to be hurled at the law. If Wales thinks so little of the foundation of both the ECJ ruling and the Advisory Council, he might have declined to take part.
A little hollow
Strangely, the Council's Report is altogether silent on the nature of search. It's such a huge part of their business that I have to think the strength of Google's objection to RTBF is energised by some threat it perceives to its famously secret algorithms.
The Google business was founded on its superior Page Rank search method, and the company has spent fantastic funds on R&D, allowing it to keep a competitive edge for a very long time. And the R&D continues. Curiously, just as everyone is debating RTBF, Google researchers published a paper about a new "knowledge based" approach to evaluating web pages. Surely if page ranking was less arbitrary and more transparent, a lot of the heat would come out of RTBF.
Of all the interests to balance in RTBF, Google's business objectives are actually a legitimate part of the mix. Google provides marvelous free services in exchange for data about its users which it converts into revenue, predominantly through advertising. It's a value exchange, and it need not be bad for privacy. A key component of privacy is transparency: people have a right to know what personal information about them is collected, and why. The RTBF analysis seems a little hollow without frank discussion of what everyone gets out of running a search engine.
- The European Court of Justice Right to be Forgotten ruling.
- Academic commentary compiled by Julia Powles and Rebekah Larsen from Cambridge University.
- "Knowledge-Based Trust: Estimating the Trustworthiness of Web Sources" by Xin Luna Dong et al, Google.
- My analysis of search and free speech, "Free search, a misnomer"
- My analysis of search as a Big Data process, "The rite to be forgotten".
A few weeks ago, Samsung was universally condemned for collecting ambient conversations through their new voice recognition Smart TV. Yet the new Hello Barbie is much worse.
Integrating natural language processing technology from ToyTalk, Mattel's high tech update to the iconic doll is said to converse with children, will get to learn voices, and will adapt the conversation in an intelligent way.
The companies say that of all the wishes they have had for Barbie, children have longed to talk to her. So now they can, the question is, at what cost?
Mario Aguilar writing in Gizmodo considers that "voice recognition technology in Hello Barbie is pretty innocuous" because he takes Mattel's word for it that they won't use conversations they collect from kids for marketing. And he accepts ToyTalk's "statement" (which it seems has not been made public) that "Mattel will only use the conversations recorded through Hello Barbie to operate and improve our products, to develop better speech recognition for children, and to improve the natural language processing of children's speech."
It's bad enough that Samsung seems to expect Smart TV buyers will study a lengthy technical privacy statement, but do we really think it's reasonable for parents to make informed consent decisions around the usage of personal information collected from a doll?
Talk about childhood's loss of innocence.
Posted in Privacy
Search engines are wondrous things. I myself use Google search umpteen times a day. I don't think I could work or play without it anymore. And yet I am a strong supporter of the contentious "Right to be Forgotten". The "RTBF" is hotly contested, and I am the first to admit it's a messy business. For one thing, it's not ideal that Google itself is required for now to adjudicate RTBF requests in Europe. But we have to accept that all of privacy is contestable. The balance of rights to privacy and rights to access information is tricky. RTBF has a long way to go, and I sense that European jurors and regulators are open and honest about this.
One of the starkest RTBF debating points is free speech. Does allowing individuals to have irrelevant, inaccurate and/or outdated search results blocked represent censorship? Is it an assault on free speech? There is surely a technical-legal question about whether the output of an algorithm represents "free speech", and as far as I can see, that question remains open. Am I the only commentator suprised by this legal blind spot? I have to say that such uncertainty destabilises a great deal of the RTBF dispute.
I am not a lawyer, but I have a strong sense that search outputs are not the sort of thing that constitutes speech. Let's bear in mind what web search is all about.
Google search is core to its multi-billion dollar advertising business. Search results are not unfiltered replicas of things found in the public domain, but rather the subtle outcome of complex Big Data processes. Google's proprietary search algorithm is famously secret, but we do know how sensitive it is to context. Most people will have noticed that search results change day by day and from place to place. But why is this?
When we enter search parameters, the result we get is actually Google's guess about what we are really looking for. Google in effect forms a hypothesis, drawing on much more than the express parameters, including our search history, browsing history, location and so on. And in all likelihood, search is influenced by the many other things Google gleans from the way we use its other properties -- gmail, maps, YouTube, hangouts and Google+ -- which are all linked now under one master data usage policy.
And here's the really clever thing about search. Google monitors how well it's predicting our real or underlying concerns. It uses a range of signals and metrics, to assess what we do with search results, and it continuously refines those processes. This is what Google really gets out of search: deep understanding of what its users are interested in, and how they are likely to respond to targeted advertising. Each search result is a little test of Google's Artificial Intelligence, which, as some like to say, is getting to know us better than we know ourselves.
As important as they are, it seems to me that search results are really just a by-product of a gigantic information business. They are nothing like free speech.
I'm going to assume readers know what's meant by the "Creepy Test" in privacy. Here's a short appeal to use the Creepy Test sparingly and carefully.
The most obvious problem with the Creepy Test is its subjectivity. One person's "creepy" can be another person's "COOL!!". For example, a friend of mine thought it was cool when he used Google Maps to check out a hotel he was going to, and the software usefully reminded him of his check-in time (evidently, Google had scanned his gmail and mashed up the registration details next time he searched for the property). I actually thought this was way beyond creepy; imagine if it wasn't a hotel but a mental health facility, and Google was watching your psychiatric appointments.
In fact, for some people, creepy might actually be cool, in the same way as horror movies or chilli peppers are cool. There's already an implicit dare in the "Nothing To Hide" argument. Some brave souls seem to brag that they haven't done anything they don't mind being made public.
Our sense of what's creepy changes over time. We can get used to intrusive technologies, and that suits the agendas of infomoplists who make fortunes from personal data, hoping that we won't notice. On the other hand, objective and technology-neutral data privacy principles have been with us for over thirty years, and by and large, they work well to address contemporary problems like facial recognition, the cloud, and augmented reality glasses.
Using folksy terms in privacy might make the topic more accessible to laypeople, but it tends to distract from the technicalities of data privacy regulations. These are not difficult matters in the scheme of things; data privacy is technically about objective and reasonable controls on the collection, use and disclosure of personally identifiable information. I encourage anyone with an interest in privacy to spend time familiarising themselves with common Privacy Principles and the definition of Personal Information. And then it's easy to see that activities like Facebook's automated face recognition and Tag Suggestions aren't merely creepy; they are objectively unlawful!
Finally and most insideously, when emotive terms like creepy are used in debating public policy, it actually disempowers the critical voices. If "creepy" is the worst thing you can say about a given privacy concern, then you're marginalised.
We should avoid being subjective about privacy. By all means, let's use the Creepy Test to help spot potential privacy problems, and kick off a conversation. But as quickly as possible, we need to reduce privacy problems to objective criteria and, with cool heads, debate the appropriate responses.
See also A Theory of Creepy: Technology, Privacy and Shifting Social Norms by Omer Tene and Jules Polonetsky.
- "Alas, intuitions and perceptions of 'creepiness' are highly subjective and difficult to generalize as social norms are being strained by new technologies and capabilities". Tene & Polonetsky.