Acknowledgement: Daniel Barth-Jones kindly engaged with me after this blog was initially published, and pointed out several significant factual errors, for which I am grateful.
In 2014, the New York Taxi & Limousine Company (TLC) released a large "anonymised" dataset containing 173 million taxi rides taken in 2013. Soon after, software developer Vijay Pandurangan managed to undo the hashed taxi registration numbers. Subsequently, privacy researcher Anthony Tockar went on to combine public photos of celebrities getting in or out of cabs, to recreate their trips. See Anna Johnston's analysis here.
This re-identification demonstration has been used by some to bolster a general claim that anonymity online is increasingly impossible.
On the other hand, medical research advocates like Columbia University epidemiologist Daniel Barth-Jones argue that the practice of de-identification can be robust and should not be dismissed as impractical on the basis of demonstrations such as this. The identifiability of celebrities in these sorts of datasets is a statistical anomaly reasons Barth-Jones and should not be used to frighten regular people out of participating in medical research on anonymised data. He wrote in a blog that:
- "However, it would hopefully be clear that examining a miniscule proportion of cases from a population of 173 million rides couldn’t possibly form any meaningful basis of evidence for broad assertions about the risks that taxi-riders might face from such a data release (at least with the taxi medallion/license data removed as will now be the practice for FOIL request data)."
As a health researcher, Barth-Jones is understandably worried that re-identification of small proportions of special cases is being used to exaggerate the risks to ordinary people. He says that the HIPAA de-identification protocols if properly applied leave no significant risk of re-id. But even if that's the case, HIPAA processes are not applied to data across the board. The TLC data was described as "de-identified" and the fact that any people at all (even stand-out celebrities) could be re-identified from data does create a broad basis for concern - "de-identified" is not what it seems. Barth-Jones stresses that in the TLC case, the de-identification was fatally flawed [technically: it's no use hashing data like registration numbers with limited value ranges because the hashed values can be reversed by brute force] but my point is this: who among us who can tell the difference between poorly de-identified and "properly" de-identified?
And how long can "properly de-identified" last? What does it mean to say casually that only a "minuscule proportion" of data can be re-identified? In this case, the re-identification of celebrities was helped by the fact lots of photos of them are readily available on social media, yet there are so many photos in the public domain now, regular people are going to get easier to be identified.
But my purpose here is not to play what-if games, and I know Daniel advocates statistically rigorous measures of identifiability. We agree on that -- in fact, over the years, we have agreed on most things. The point I am trying to make in this blog post is that, just as nobody should exaggerate the risk of re-identification, nor should anyone play it down. Claims of de-identification are made almost daily for really crucial datasets, like compulsorily retained metadata, public health data, biometric templates, social media activity used for advertising, and web searches. Some of these claims are made with statistical rigor, using formal standards like the HIPAA protocols; but other times the claim is casual, made with no qualification, with the aim of comforting end users.
"De-identified" is a helluva promise to make, with far-reaching ramifications. Daniel says de-identification researchers use the term with caution, knowing there are technical qualifications around the finite probability of individuals remaining identifiable. But my position is that the fine print doesn't translate to the general public who only hear that a database is "anonymous". So I am afraid the term "de-identified" is meaningless outside academia, and in casual use is misleading.
Barth-Jones objects to the conclusion that "it's virtually impossible to anonymise large data sets" but in an absolute sense, that claim is surely true. If any proportion of people in a dataset may be identified, then that data set is plainly not "anonymous". Moreover, as statistics and mathematical techniques (like facial recognition) improve, and as more ancillary datasets (like social media photos) become accessible, the proportion of individuals who may be re-identified will keep going up.[Readers who wish to pursue these matters further should look at the recent Harvard Law School online symposium on "Re-identification Demonstrations", hosted by Michelle Meyer, in which Daniel Barth-Jones and I participated, among many others.]
Both sides of this vexed debate need more nuance. Privacy advocates have no wish to quell medical research per se, nor do they call for absolute privacy guarantees, but we do seek full disclosure of the risks, so that the cost-benefit equation is understood by all. One of the obvious lessons in all this is that "anonymous" or "de-identified" on their own are poor descriptions. We need tools that meaningfully describe the probability of re-identification. If statisticians and medical researchers take "de-identified" to mean "there is an acceptably small probability, namely X percent, of identification" then let's have that fine print. Absent the detail, lay people can be forgiven for thinking re-identification isn't going to happen. Period.
And we need policy and regulatory mechanisms to curb inappropriate re-identification. Anonymity is a brittle, essentially temporary, and inadequate privacy tool.
I argue that the act of re-identification ought to be treated as an act of Algorithmic Collection of PII, and regulated as just another type of collection, albeit an indirect one. If a statistical process results in a person's name being added to a hitherto anonymous record in a database, it is as if the data custodian went to a third party and asked them "do you know the name of the person this record is about?". The fact that the data custodian was clever enough to avoid having to ask anyone about the identity of people in the re-identified dataset does not alter the privacy responsibilities arising. If the effect of an action is to convert anonymous data into personally identifiable information (PII), then that action collects PII. And in most places around the world, any collection of PII automatically falls under privacy regulations.
It looks like we will never guarantee anonymity, but the good news is that for privacy, we don't actually need to. Privacy is the protection you need when you affairs are not anonymous, for privacy is a regulated state where organisations that have knowledge about you are restrained in what they do with it. Equally, the ability to de-anonymise should be restricted in accordance with orthodox privacy regulations. If a party chooses to re-identify people in an ostensibly de-identified dataset, without a good reason and without consent, then that party may be in breach of data privacy laws, just as they would be if they collected the same PII by conventional means like questionnaires or surveillance.
Surely we can all agree that re-identification demonstrations serve to shine a light on the comforting claims made by governments for instance that certain citizen datasets can be anonymised. In Australia, the government is now implementing telecommunications metadata retention laws, in the interests of national security; the metadata we are told is de-identified and "secure". In the UK, the National Health Service plans to make de-identified patient data available to researchers. Whatever the merits of data mining in diverse fields like law enforcement and medical research, my point is that any government's claims of anonymisation must be treated critically (if not skeptically), and subjected to strenuous and ongoing privacy impact assessment.
Privacy, like security, can never be perfect. Privacy advocates must avoid giving the impression that they seek unrealistic guarantees of anonymity. There must be more to privacy than identity obscuration (to use a more technically correct term than "de-identification"). Medical research should proceed on the basis of reasonable risks being taken in return for beneficial outcomes, with strong sanctions against abuses including unwarranted re-identification. And then there wouldn't need to be a moral panic over re-identification if and when it does occur, because anonymity, while highly desirable, is not essential for privacy in any case.
The State Of Identity Management in 2015
Constellation Research recently launched the "State of Enterprise Technology" series of research reports. These assess the current enterprise innovations which Constellation considers most crucial to digital transformation, and provide snapshots of the future usage and evolution of these technologies.
My second contribution to the state-of-the-state series is "Identity Management Moves from Who to What". Here's an excerpt from the report:
In spite of all the fuss, personal identity is not usually important in routine business. Most transactions are authorized according to someone’s credentials, membership, role or other properties, rather than their personal details. Organizations actually deal with many people in a largely impersonal way. People don’t often care who someone really is before conducting business with them. So in digital Identity Management (IdM), one should care less about who a party is than what they are, with respect to attributes that matter in the context we’re in. This shift in focus is coming to dominate the identity landscape, for it simplifies a traditionally multi-disciplined problem set. Historically, the identity management community has made too much of identity!
Six Digital Identity Trends for 2015
1. Mobile becomes the center of gravity for identity. The mobile device brings convergence for a decade of progress in IdM. For two-factor authentication, the cell phone is its own second factor, protected against unauthorized use by PIN or biometric. Hardly anyone ever goes anywhere without their mobile - service providers can increasingly count on that without disenfranchising many customers. Best of all, the mobile device itself joins authentication to the app, intimately and seamlessly, in the transaction context of the moment. And today’s phones have powerful embedded cryptographic processors and key stores for accurate mutual authentication, and mobile digital wallets, as Apple’s Tim Cook highlighted at the recent White House Cyber Security Summit.
2. Hardware is the key – and holds the keys – to identity. Despite the lure of the cloud, hardware has re-emerged as pivotal in IdM. All really serious security and authentication takes place in secure dedicated hardware, such as SIM cards, ATMs, EMV cards, and the new Trusted Execution Environment mobile devices. Today’s leading authentication initiatives, like the FIDO Alliance, are intimately connected to standard cryptographic modules now embedded in most mobile devices. Hardware-based identity management has arrived just in the nick of time, on the eve of the Internet of Things.
3. The “Attributes Push” will shift how we think about identity. In the words of Andrew Nash, CEO of Confyrm Inc. (and previously the identity leader at PayPal and Google), “Attributes are at least as interesting as identities, if not more so.” Attributes are to identity as genes are to organisms – they are really what matters about you when you’re trying to access a service. By fractionating identity into attributes and focusing on what we really need to reveal about users, we can enhance privacy while automating more and more of our everyday transactions.
The Attributes Push may recast social logon. Until now, Facebook and Google have been widely tipped to become “Identity Providers”, but even these giants have found federated identity easier said than done. A dark horse in the identity stakes – LinkedIn – may take the lead with its superior holdings in verified business attributes.
4. The identity agenda is narrowing. For 20 years, brands and organizations have obsessed about who someone is online. And even before we’ve solved the basics, we over-reached. We've seen entrepreneurs trying to monetize identity, and identity engineers trying to convince conservative institutions like banks that “Identity Provider” is a compelling new role in the digital ecosystem. Now at last, the IdM industry agenda is narrowing toward more achievable and more important goals - precise authentication instead of general identification.
5. A digital identity stack is emerging. The FIDO Alliance and others face a challenge in shifting and improving the words people use in this space. Words, of course, matter, as do visualizations. IdM has suffered for too long under loose and misleading metaphors. One of the most powerful abstractions in IT was the OSI networking stack. A comparable sort of stack may be emerging in IdM.
6. Continuity will shape the identity experience. Continuity will make or break the user experience as the lines blur between real world and virtual, and between the Internet of Computers and the Internet of Things. But at the same time, we need to preserve clear boundaries between our digital personae, or else privacy catastrophes await. “Continuous” (also referred to as “Ambient”) Authentication is a hot new research area, striving to provide more useful and flexible signals about the instantaneous state of a user at any time. There is an explosion in devices now that can be tapped for Continuous Authentication signals, and by the same token, rich new apps in health, lifestyle and social domains, running on those very devices, that need seamless identity management.
A snapshot at my report "Identity Moves from Who to What" is available for download at Constellation Research. It expands on the points above, and sets out recommendations for enterprises to adopt the latest identity management thinking.
The media gets excited about gene therapy. With the sequencing of genomes becoming ever cheaper and accessible, a grand vision of gene therapy is now being put about all too casually by futurists in which defective genetic codes are simply edited out and replaced by working ones. At the same time there is broader idea of "Precision Medicine" which envisages doctors scanning your entire DNA blueprint, instantly spotting the defects that ail you, and ordering up a set of customized pharmaceuticals precisely fitted to your biochemical idiosyncrasies.
There is more to gene therapy -- genetic engineering of live patients -- than the futurists let on.
A big question for mine is this: How, precisely, will the DNA repairs be made? Lay people might be left to presume it's like patching your operating system, which is not a bad metaphor, until you think a bit more about how and where patches are made to a computer.
A computer has one copy of any given software, stored in long term memory. And operating systems come with library functions for making updates. Patching software involves arriving with a set of corrections in a file, and requesting via APIs that the corrections be slotted into the right place, replacing the defective code.
But DNA doesn't work like this. While the genome is indeed something of an operating system, that's not the whole story. Sub-systems for making changes to the genome are not naturally built into an organism, because genes are only supposed to change at the time the software is installed. Our genomes are carved up en masse when germ cells (eggs and sperm) are made, and the genomes are put back together when we have sex, and then passed into our children. There is no part of the genetic operating system that allows selected parts of the genetic source code to be edited later, and -- this is the crucial bit -- spread through a living organism.
Genetic engineering, such as it is today, involves editing the genomes of embryos at a very early stage of their lifecycle, so the changes propagate as the embryo grows. Thus we have tomatoes fitted with arctic fish genes to stave off cold, and canola that resists pesticides. But the idea that's presented of gene therapy is very different; it has to impose changes to the genome in all the trillions of copies of the code in every cell in a fully developed organism. You see, there's another crucial thing about the DNA-is-software metaphor: there is no central long term program memory for our genes. Instead the DNA program is instantiated in every single cell of the body.
To change the DNA in a mature cell, geneticists have to edit it by means other than sexual reproduction. As I noted, there is no natural "API" for doing this, so they've invented a clever trick, co-opting viruses - nature's DNA hackers. Viruses work by squeezing their minuscule bodies through the cell walls of a host organism, latching onto DNA strands inside, and crudely adding their own code fragments, pretty much at random, into the host's genome. Viruses are designed (via evolution) to inject arbitrary genes into another organism's DNA (arbitrary relative to the purpose of the host DNA's that is). Viruses are just what gene therapists need to edit faulty DNA in situ.
I know a bit about cystic fibrosis and the visions for a genetic cure. The faulty gene that causes CF was identified decades ago and its effect on chlorine chemistry is well understood. By disrupting the way chlorine ions are handled in cells, CF ruins mucus membranes, with particularly bad results for the lungs and digestive system. From the 1980s, it was thought that repairs to the CF gene could be delivered to cells in the lung lining by an engineered virus carried in an aerosol. Because only a small fraction of cells exposed to the virus could have their genes so updated, scientists expected that the repairs would be both temporary and partial, and that fresh viruses would need to be delivered every few weeks, a period determined by the rate at which lung cells die and get replaced.
Now please think about the tacit promises of gene therapy today. The story we hear is essentially all about the wondrous informatics and the IT. Within a few years we're told doctors will be able to sequence a patient's entire genome for a few dollars in a few minutes, using a desk top machine in the office. It's all down to Moore's Law and computer technology. There's an assumption that as the power goes up and the costs go down, geneticists will in parallel work out what all the genes mean, including how they interact, and develop a catalog of known faults and logical repairs.
Let's run with that optimism (despite the fact that just a few years ago they found that "Junk DNA" turns out be active in ways that were not predicted; it's a lot like Dark Matter - important, ubiquitous and mysterious). The critical missing piece of the gene therapy story is how the patches are going to be made. Some reports imply that a whole clean new genome can be synthesised and somehow installed in the patient. Sorry, but how?
For thirty years they've tried and failed to rectify the one cystic fibrosis gene in readily accessible lung cells. Now we're supposed to believe that whole stretches of DNA are going to swapped out in all the cells of the body? It's vastly harder than the CF problem, on at least three dimensions: (1) the numbers and complexity of the genes involved, (2) the numbers of cells and tissue systems that need to be patched all at once, and (3) the delivery mechanism for getting modified viruses (I guess) where they need to do their stuff.
It's so easy being a futurist. People adore your vision, and you don't need to worry about practicalities. The march of technology, seen with 20:20 hindsight, appears to make all dreams come true. Practicalities are left to sort themselves out.
But I think it takes more courage to say, of gene therapy, it's not going to happen.
I have just updated my periodic series of research reports on the FIDO Alliance. The fourth report, "FIDO Alliance Update: On Track to a Standard" is available at Constellation Research (for free for a time).
The Identity Management industry leader publishes its protocol specifications at v1.0, launches a certification program, and attracts support in Microsoft Windows 10.
The FIDO Alliance is the fastest-growing Identity Management (IdM) consortium we have seen. Comprising technology vendors, solutions providers, consumer device companies, and e-commerce services, the FIDO Alliance is working on protocols and standards to strongly authenticate users and personal devices online. With a fresh focus and discipline in this traditionally complicated field, FIDO envisages simply “doing for authentication what Ethernet did for networking”.
Launched in early 2013, the FIDO Alliance has now grown to over 180 members. Included are technology heavyweights like Google, Lenovo and Microsoft; almost every SIM and smartcard supplier; payments giants Discover, MasterCard, PayPal and Visa; several banks; and e-commerce players like Alibaba and Netflix.
FIDO is radically different from any IdM consortium to date. We all know how important it is to fix passwords: They’re hard to use, inherently insecure, and lie at the heart of most breaches. The Federated Identity movement seeks to reduce the number of passwords by sharing credentials, but this invariably confounds the relationships we have with services and complicates liability when more parties rely on fewer identities.
In contrast, FIDO’s mission is refreshingly clear: Take the smartphones and devices most of us are intimately connected to, and use the built-in cryptography to authenticate users to services. A registered FIDO-compliant device, when activated by its user, can send verified details about the device and the user to service providers, via standardized protocols. FIDO leverages the ubiquity of sophisticated handsets and the tidal wave of smart things. The Alliance focuses on device level protocols without venturing to change the way user accounts are managed or shared.
The centerpieces of FIDO’s technical work are two protocols, called UAF and U2F, for exchanging verified authentication signals between devices and services. Several commercial applications have already been released under the UAF and U2F specifications, including fingerprint-based payments apps from Alibaba and PayPal, and Google’s Security Key from Yubico. After a rigorous review process, both protocols are published now at version 1.0, and the FIDO Certified Testing program was launched in April 2015. And Microsoft announced that FIDO support would be built into Windows 10.
With its focus, pragmatism and membership breadth, FIDO is today’s go-to authentication standards effort. In this report, I look at what the FIDO Alliance has to offer vendors and end user communities, and its critical success factors.
I had a letter to the editor published in Nature on big data and privacy.
Nature 519, 414 (26 March 2015) doi:10.1038/519414a
Published online 25 March 2015
Letter as published
Privacy issues around data protection often inspire over-engineered responses from scientists and technologists. Yet constraints on the use of personal data mean that privacy is less about what is done with information than what is not done with it. Technology such as new algorithms may therefore be unnecessary (see S. Aftergood, Nature 517, 435–436; 2015).
Technology-neutral data-protection laws afford rights to individuals with respect to all data about them, regardless of the data source. More than 100 nations now have such data-privacy laws, typically requiring organizations to collect personal data only for an express purpose and not to re-use those data for unrelated purposes.
If businesses come to know your habits, your purchase intentions and even your state of health through big data, then they have the same privacy responsibilities as if they had gathered that information directly by questionnaire. This is what the public expects of big-data algorithms that are intended to supersede cumbersome and incomplete survey methods. Algorithmic wizardry is not a way to evade conventional privacy laws.
Constellation Research, Sydney, Australia.
Yawn. Alexander Nazaryan in Newsweek (March 22) has penned yet another tirade against privacy.
His column is all strawman. No one has ever said privacy is more important than other rights and interests. The infamous Right to be Forgotten is a case in point -- the recent European ruling is expressly about balancing competing interests, around privacy and public interest. All privacy rules and regulations, our intuitions and habits, all concede there may be over-riding factors in the mix.
So where on earth does the author and his editors get the following shrill taglines from?
- "You’re 100% Wrong About Privacy"
- "Our expectation of total online privacy is unrealistic and dangerous"
- "Total privacy is a dangerous delusion".
It is so tiresome that we advocates have to keep correcting grotesque misrepresentations of our credo. The right to be let alone was recognised in American law 125 years ago, and was written into the UN International Covenant on Civil and Political Rights in 1966. Every generation witnesses again the rhetorical question "Is Privacy Dead?" (see Newsweek, 27 July 1970). The answer, after fifty years, is still "no". The very clear trend worldwide is towards more privacy regulation, not less.
Funnily enough, Nazaryan makes a case for privacy himself, when he reminds us by-the-by that "the feds do covertly collect data about us, often with the complicity of high-tech and telecom corporations" and that "any user of Google has to expect that his/her information will be used for commercial gain". Most reasonable people look to privacy to address such ugly imbalances!
Why are critics of privacy so coldly aggressive? If Nazaryan feels no harm comes from others seeing him searching porn, then we might all admire his confidence. But is it any of his business what the rest of us do in private? Or the government's business, or Google's?
Privacy is just a fundamental matter of restraint. People should only have their personal information exposed on a need-to-know basis. Individuals don't have to justify their desire for privacy! The onus must be on the watchers to justify their interests.
Why do Alexander Nazaryan and people of his ilk so despise privacy? I wonder what political or commercial agendas they have to hide?
Posted in Privacy
The Australian Payments Clearing Association (APCA) releases card fraud statistics every six months for the preceding 12m period. Lockstep monitors these figures and plots the trend data. We got a bit too busy in 2014 and missed the last couple of APCA releases, so this blog is a catch up, summarising and analysing stats from calendar year 2013 and AU financial year 2014 (July 2013 to June 2014).
In the 12 months to June 2014,
- Total card fraud rose by 22% to A$321 million
- Card Not Present (CNP) fraud rose 27% to A$256 million
- CNP fraud now represents 80% of all card fraud.
APCA is one of the major payments systems regulators in Australia. It has only ever had two consistent things to say about Card Not Present fraud. First, it reassures the public that CNP fraud is only rising because online shopping is rising, implying that it's really not a big deal. Second, APCA produces advice for shoppers and merchants to help them stay safe online.
I suppose that in the 1950s and 60s, when the road toll started rising dranatically and car makers we called on to improve safety, the auto industry might have played down that situation like APCA does with CNP fraud. "Of course the road toll is high" they might have said; "it's because so many people love driving!". Fraud is not a necessary part of online shopping; at some point payments regulators will have to tell us, as a matter of policy, what level of fraud they think is actually reasonable, and start to press the industry to take action. In absolute terms, CNP fraud has ballooned by a factor of 10 in the past eight years. The way it's going, annual online fraud might overtake the cost of car theft (currently $680 million) before 2020.
As for APCA's advice for shoppers to stay safe online, most of it is nearly useless. In their Christmas 2014 media release (PDF), APCA suggested:
Consumers can take simple steps to help stay safe when shopping online including:
- Only providing their card details on secure websites – looking for the locked padlock.
- Always keeping their PC security software up-to-date and doing a full scan often.
The truth is very few payment card details are stolen from websites or people's computers. Organised crime targets the databases of payment processors and big merchants, where they steal the details of tens of millions of cardholders at once. Four of the biggest ever known credit card breaches occurred in the last 18 months (Ref: DataLossDB):
- 109,000,000 credit cards - Home Depot, September 2014
- 110,000,000 credit cards - Target, December 2013
- 145,000,000 credit cards - eBay, May 2014
- 152,000,000 credit cards - Adobe, Oct 2013.
In its latest Data Breach Investigations Report, Verizon states that "2013 may be remembered as ... a year of transition to large-scale attacks on payment card systems".
Verizon has plotted the trends in data breaches at different sources; it's very clear that servers (where the datsa is held) have always been the main target of cybercriminals, and are getting proportionally more attention year on year. Diagrag at right from Verizon Data Breach Investigations Report 2014.
So APCA's advice to look for website padlocks and keep anti-virus up-to-date - as important as that may be - won't do much at all to curb payment card theft or fraud. You might never have shopped online in your life, and still have your card details stolen, behind your back, at a department store breach.
Over the course of a dozen or more card fraud reports, APCA has had an on-again-off-again opinion of the credit card scheme's flagship CNP security measure, 3D Secure. In FY2011 (after CNP fraud went up 46%), APCA said "retailers should be looking at a 3D Secure solution for their online checkout". Then in their FY2012 media release, as losses kept increasing, they made no mention of 3D Secure at all.
Calendar year 2012 saw Australian CNP fraud fall for the first time ever, and APCA was back on the 3D Secure bandwagon, reporting that "The drop in CNP fraud can largely be attributed to an increase in the use of authentication tools such as MasterCard SecureCode and Verified by Visa, as well as dedicated fraud prevention tools."
Sadly, it seems 2012 was a blip. Online fraud for FY2014 (PDF) has returned to the long term trend. It's impossible to say what impact 3D Secure has really had in Australia, but penetration and consumer awareness of this technology remains low. It was surprising that APCA previously rushed to attribute a short-term drop in fraud to 3D Secure; that now seems overly optimistic, with CNP frauds continuing to mount after all.
In my view, it beggars belief the payments industry has yet to treat CNP fraud as seriously as it did skimming and carding. Technologically, CNP fraud is not a hard problem. It's just the digital equivalent of analogue skimming and carding, and it could be stopped just as effectively by using chips to protect cardholder data, just as they do in Card Present payments, whether by EMV card or NFC mobile devices.
In 2012, I published a short paper on this: Calling for a Uniform Approach to Card Fraud Offline and On (PDF).
The credit card payments system is a paragon of standardisation. No other industry has such a strong history of driving and adopting uniform technologies, infrastructure and business processes. No matter where you keep a bank account, you can use a globally branded credit card to go shopping in almost every corner of the world. The universal Four Party settlement model, and a long-standing card standard that works the same with ATMs and merchant terminals everywhere underpin seamless convenience. So with this determination to facilitate trustworthy and supremely convenient spending in every corner of the earth, it’s astonishing that the industry is still yet to standardise Internet payments. We settled on the EMV standard for in-store transactions, but online we use a wide range of confusing and largely ineffective security measures. As a result, Card Not Present (CNP) fraud is growing unchecked.
This article argues that all card payments should be properly secured using standardised hardware. In particular, CNP transactions should use the very same EMV chip and cryptography as do card present payments.
In May 2014, the European Court of Justice (ECJ) ruled that under European law, people have the right to have certain information about them delisted from search engine results. The ECJ ruling was called the "Right to be Forgotten", despite it having little to do with forgetting (c'est la vie). Shortened as RTBF, it is also referred to more clinically as the "Right to be Delisted" (or simply as "Google Spain" because that was one of the parties in the court action). Within just a few months, the RTBF has triggered conferences, public debates, and a TEDx talk.
Google itself did two things very quickly in response to the RTBF ruling. First, it mobilised a major team to process delisting requests. This is no mean feat -- over 200,000 requests have been received to date; see Google's transparency report. However it's not surprising they got going so quickly as they already have well-practiced processes for take-down notices for copyright and unlawful material.
Secondly, the company convened an Advisory Council of independent experts to formulate strategies for balancing the competing rights and interests bound up in RTBF. The Advisory Council delivered its report in January; it's available online here.
I declare I'm a strong supporter of RTBF. I've written about it here and here, and participated in an IEEE online seminar. I was impressed by the intellectual and eclectic make-up of the Council, which includes a past European Justice Minister, law professors, and a philosopher. And I do appreciate that the issues are highly complex. So I had high expectations of the Council's report.
Yet I found it quite barren.
Recap - the basics of RTBF
EU Justice Commissioner Martine Reicherts in a speech last August gave a clear explanation of the scope of the ECJ ruling, and acknowledged its nuances. Her speech should be required reading. Reicherts summed up the situation thus:
- What did the Court actually say on the right to be forgotten? It said that individuals have the right to ask companies operating search engines to remove links with personal information about them – under certain conditions - when information is inaccurate, inadequate, irrelevant, outdated or excessive for the purposes of data processing. The Court explicitly ruled that the right to be forgotten is not absolute, but that it will always need to be balanced against other fundamental rights, such as the freedom of expression and the freedom of the media – which, by the way, are not absolute rights either.
Everyone concerned acknowledges there are tensions in the RTBF ruling. The Google Advisory Council Report mentions these tensions (in Section 3) but sadly spends no time critically exploring them. In truth, all privacy involves conflicting requirements, and to that extent, many features of RTBF have been seen before. At p5, the Report mentions that "the [RTBF] Ruling invokes a data subject’s right to object to, and require cessation of, the processing of data about himself or herself" (emphasis added); the reader may conclude, as I have, that the computing of search results by a search engine is just another form of data processing.
One of the most important RTBF talking points is whether it's fair that Google is made to adjudicate delisting requests. I have some sympathies for Google here, and yet this is not an entirely novel situation in privacy. A standard feature of international principles-based privacy regimes is the right of individuals to have erroneous personal data corrected (this is, for example, OECD Privacy Principle No. 7 - Individual Participation, and Australian Privacy Principle No. 13 - Correction of Personal Information). And at the top of p5, the Council Report cites the right to have errors rectified. So it is standard practice that a data custodian must have means for processing access and correction requests. Privacy regimes expect there to be dispute resolution mechanisms too, operated by the company concerned. None of this is new. What seems to be new to some stakeholders is the idea that the results of a search engine is just another type of data processing.
A little rushed
The Council explains in the Introduction to the Report that it had to work "on an accelerated timeline, given the urgency with which Google had to begin complying with the Ruling once handed down". I am afraid that the Report shows signs of being a little rushed.
- There are several spelling errors.
- The contributions from non English speakers could have done with some editing.
- Less trivially, many of the footnotes need editing; it's not always clear how a person's footnoted quote supports the text.
- More importantly, the Advisory Council surely operated with Terms of Reference, yet there is no clear explanation of what those were. At the end of the introduction, we're told the group was "convened to advise on criteria that Google should use in striking a balance, such as what role the data subject plays in public life, or whether the information is outdated or no longer relevant. We also considered the best process and inputs to Google’s decision making, including input from the original publishers of information at issue, as potentially important aspects of the balancing exercise." I'm surprised there is not a more complete and definitive description of the mission.
- It's not actually clear what sort of search we're all talking about. Not until p7 of the Report does the qualified phrase "name-based search" first appear. Are there other types of search for which the RTBF does not apply?
- Above all, it's not clear that the Council has reached a proper conclusion. The Report makes a number of suggestions in passing, and there is a collection of "ideas" at the back for improving the adjudication process, but there is no cogent set of recommendations. That may be because the Council didn't actually reach consensus.
And that's one of the most surprising things about the whole exercise. Of the eight independent Council members, five of them wrote "dissenting opinions". The work of an expert advisory committee is not normally framed as a court-like determination, from which members might dissent. And even if it was, to have the majority of members "dissent" casts doubt on the completeness or even the constitution of the process. Is there anything definite to be dissented from?
Jimmy Wales, the Wikipedia founder and chair, was especially strident in his individual views at the back of the Report. He referred to "publishers whose works are being suppressed" (p27 of the Report), and railed against the report itself, calling its recommendation "deeply flawed due to the law itself being deeply flawed". Can he mean the entire Charter of Fundamental Rights of the EU and European Convention on Human Rights? Perhaps Wales is the sort of person that denies there are any nuances in privacy, because "suppressed" is an exaggeration if we accept that RTBF doesn't cause anything to be forgotten. In my view, it poisons the entire effort when unqualified insults are allowed to be hurled at the law. If Wales thinks so little of the foundation of both the ECJ ruling and the Advisory Council, he might have declined to take part.
A little hollow
Strangely, the Council's Report is altogether silent on the nature of search. It's such a huge part of their business that I have to think the strength of Google's objection to RTBF is energised by some threat it perceives to its famously secret algorithms.
The Google business was founded on its superior Page Rank search method, and the company has spent fantastic funds on R&D, allowing it to keep a competitive edge for a very long time. And the R&D continues. Curiously, just as everyone is debating RTBF, Google researchers published a paper about a new "knowledge based" approach to evaluating web pages. Surely if page ranking was less arbitrary and more transparent, a lot of the heat would come out of RTBF.
Of all the interests to balance in RTBF, Google's business objectives are actually a legitimate part of the mix. Google provides marvelous free services in exchange for data about its users which it converts into revenue, predominantly through advertising. It's a value exchange, and it need not be bad for privacy. A key component of privacy is transparency: people have a right to know what personal information about them is collected, and why. The RTBF analysis seems a little hollow without frank discussion of what everyone gets out of running a search engine.
- The European Court of Justice Right to be Forgotten ruling.
- Academic commentary compiled by Julia Powles and Rebekah Larsen from Cambridge University.
- "Knowledge-Based Trust: Estimating the Trustworthiness of Web Sources" by Xin Luna Dong et al, Google.
- My analysis of search and free speech, "Free search, a misnomer"
- My analysis of search as a Big Data process, "The rite to be forgotten".
A few weeks ago, Samsung was universally condemned for collecting ambient conversations through their new voice recognition Smart TV. Yet the new Hello Barbie is much worse.
Integrating natural language processing technology from ToyTalk, Mattel's high tech update to the iconic doll is said to converse with children, will get to learn voices, and will adapt the conversation in an intelligent way.
The companies say that of all the wishes they have had for Barbie, children have longed to talk to her. So now they can, the question is, at what cost?
Mario Aguilar writing in Gizmodo considers that "voice recognition technology in Hello Barbie is pretty innocuous" because he takes Mattel's word for it that they won't use conversations they collect from kids for marketing. And he accepts ToyTalk's "statement" (which it seems has not been made public) that "Mattel will only use the conversations recorded through Hello Barbie to operate and improve our products, to develop better speech recognition for children, and to improve the natural language processing of children's speech."
It's bad enough that Samsung seems to expect Smart TV buyers will study a lengthy technical privacy statement, but do we really think it's reasonable for parents to make informed consent decisions around the usage of personal information collected from a doll?
Talk about childhood's loss of innocence.
Posted in Privacy
Search engines are wondrous things. I myself use Google search umpteen times a day. I don't think I could work or play without it anymore. And yet I am a strong supporter of the contentious "Right to be Forgotten". The "RTBF" is hotly contested, and I am the first to admit it's a messy business. For one thing, it's not ideal that Google itself is required for now to adjudicate RTBF requests in Europe. But we have to accept that all of privacy is contestable. The balance of rights to privacy and rights to access information is tricky. RTBF has a long way to go, and I sense that European jurors and regulators are open and honest about this.
One of the starkest RTBF debating points is free speech. Does allowing individuals to have irrelevant, inaccurate and/or outdated search results blocked represent censorship? Is it an assault on free speech? There is surely a technical-legal question about whether the output of an algorithm represents "free speech", and as far as I can see, that question remains open. Am I the only commentator suprised by this legal blind spot? I have to say that such uncertainty destabilises a great deal of the RTBF dispute.
I am not a lawyer, but I have a strong sense that search outputs are not the sort of thing that constitutes speech. Let's bear in mind what web search is all about.
Google search is core to its multi-billion dollar advertising business. Search results are not unfiltered replicas of things found in the public domain, but rather the subtle outcome of complex Big Data processes. Google's proprietary search algorithm is famously secret, but we do know how sensitive it is to context. Most people will have noticed that search results change day by day and from place to place. But why is this?
When we enter search parameters, the result we get is actually Google's guess about what we are really looking for. Google in effect forms a hypothesis, drawing on much more than the express parameters, including our search history, browsing history, location and so on. And in all likelihood, search is influenced by the many other things Google gleans from the way we use its other properties -- gmail, maps, YouTube, hangouts and Google+ -- which are all linked now under one master data usage policy.
And here's the really clever thing about search. Google monitors how well it's predicting our real or underlying concerns. It uses a range of signals and metrics, to assess what we do with search results, and it continuously refines those processes. This is what Google really gets out of search: deep understanding of what its users are interested in, and how they are likely to respond to targeted advertising. Each search result is a little test of Google's Artificial Intelligence, which, as some like to say, is getting to know us better than we know ourselves.
As important as they are, it seems to me that search results are really just a by-product of a gigantic information business. They are nothing like free speech.