Lockstep

Mobile: +61 (0) 414 488 851
Email: swilson@lockstep.com.au

Simply Secure is not simply private

Another week, another security collaboration launch!

"Simply Secure" calls itself “a small but growing organization [with] expertise in usability research, design, software development, and product management". Their mission has to do with improving the security functions that built-in so badly in most software today. Simply Secure is backed by Google and Dropbox, and supported by a diverse advisory board.

It's early days (actually early day, singular) so it might be churlish to point out that Simply Secure's strategic messaging is a little uneven ... except that the words being used to describe it shed light on the clarity of the thinking.

My first exposure to Simply Secure came last night, when I read an article in the Guardian by Cory Doctorow (who is one of their advisers). Doctorow places enormous emphasis on privacy; the word “privacy" outnumbers “security" 16 to three in the body of his column. Another admittedly shorter report about the launch by The Next Web doesn't mention privacy at all. And then there's the Simply Secure blog post, which cites privacy a great deal but every single time in conjunction with security, as in “security and privacy". That repeated phrasing conveys, to me at least, some discomfort. As I say, it's early days and the team is doubtless sorting out how to weigh and progress these closely related objectives.

But I hope they do it quickly. On the face of it, Simply Secure might only scratch the surface of privacy.

Doctorow's Guardian article is mostly concerned with encryption and the terrible implementations that have plagued us since the dawn of the Internet. It's definitely important that we improve here – and radically. If the Simply Secure initiative does nothing but make encryption easier to integrate into commodity software, that would be a great thing. I'm all for it. But it won't necessarily or even probably lead to better privacy, because privacy is about restraint not secrecy or anonymity.
As we go about our lives, we actually want to be known by others, but we want those who know us to be restrained in what they do with the knowledge they have about us. Privacy is the protection you need when your affairs are not secret.

I know Doctorow knows this – I've seen his terrific little speech on the steps on Comic-Con about PRISM. So I'm confused by his focus on cryptography.

How far does encryption get us? If we're using social networks, or if we're shopping and opting in to loyalty programs or selected targeted marketing, or if we're sharing our medical records with relatives, medicos, hospitals and researchers, then encryption becomes moot. We need mechanisms to restrain what the receivers of our personal information do with it. We all know the business model at work behind “free" online services; using encryption to protect privacy in social networking for instance would be like using an armoured van to deliver your valuables to Bernie Madoff.

Another limitation of user-centric or user-managed encryption has to do with Big Data. A great deal of personal information about us is created and collected unseen behind our backs, by sensors, and by analytics processes than manage to work out who we are by linking disparate data streams together. How could SS ameliorate those sorts of problems? If the SS vision includes encryption at rest as well as in transit, then how will the user control or even see all the secondary uses of their encrypted personal information?

There's a combativeness in Doctorow's explanation of Simply Secure and his tweets from yesterday on the topic. His aim is expressly to thwart the surveillance state, which in his view includes a symbiosis (if not conspiracy) between government and internet companies, where the former gets their dirty work done by the latter. I'm sure he and I both find that abhorrent in equal measure. But I argue the proper response to these egregious behaviours is political not technological (and political in the broad sense; I love that Snowden talks as much about accountability, legal processes, transparency and research as he does about encryption). If you think the government is exploiting the exploiters, then DIY encryption is a pretty narrow counter-measure. This is not the sort of society we want to live in, so let's work to change the establishment, rather than try to take it on in a crypto shoot-out.

Yes security technology is important but it's not nearly as important for privacy as the Rule of Law. Data privacy regimes instil restraint. The majority of businesses come to know that they are not at liberty to over-collect personal information, nor to re-use personal information unexpectedly and without consent. A minority of organisations flout data privacy principles, for example by slyly refining raw data into valuable personal knowledge, exploiting the trust citizens and users put in them. Some of these outfits flourish in the United States – the Canary Islands of privacy. Worldwide, the policing of privacy is patchy indeed, yet there have been spectacular legal victories in Europe and elsewhere against the excessive practices of really big companies like Facebook with their biometric data mining of photo albums, and Google's drift net-like harvesting of traffic from unencrypted Wi-Fi networks.

Pragmatically, I'm afraid encryption is such a fragile privacy measure. Once secrecy is penetrated, we need regulations to stem exploitation of our personal information.

By all means, let's improve cryptographic engineering and I wish the Simply Secure initiative all the best. So long as they don't call security privacy.

Posted in Security, Privacy, Language, Big Data

New Paper Coming: The collision between Big Data and privacy law

I have a new academic paper due to be published in October, in the Australian Journal of Telecommunications and the Digital Economy. Here is an extract.

Update: see Telecommunications Society members page.

The collision between Big Data and privacy law

Abstract

We live in an age where billionaires are self-made on the back of the most intangible of assets – the information they have about us. The digital economy is awash with data. It's a new and endlessly re-useable raw material, increasingly left behind by ordinary people going about their lives online. Many information businesses proceed on the basis that raw data is up for grabs; if an entrepreneur is clever enough to find a new vein of it, they can feel entitled to tap it in any way they like. However, some tacit assumptions underpinning today's digital business models are naive. Conventional data protection laws, older than the Internet, limit how Personal Information is allowed to flow. These laws turn out to be surprisingly powerful in the face of 'Big Data' and the 'Internet of Things'. On the other hand, orthodox privacy management was not framed for new Personal Information being synthesised tomorrow from raw data collected today. This paper seeks to bridge a conceptual gap between data analytics and privacy, and sets out extended Privacy Principles to better deal with Big Data.

Introduction

'Big Data' is a broad term capturing the extraction of knowledge and insights from unstructured data. While data processing and analysis is as old as computing, the term 'Big Data' has recently attained special meaning, thanks to the vast rivers of raw data that course unseen through the digital economy, and the propensity for entrepreneurs to tap that resource for their own profit, or to build new analytic tools for enterprises. Big Data represents one of the biggest challenges to privacy and data protection society has seen. Never before has so much Personal Information been available so freely to so many.

Big Data promises vast benefits for a great many stakeholders (Michael & Miller 2013: 22-24) but the benefits may be jeopardized by the excesses of a few overly zealous businesses. Some online business models are propelled by a naive assumption that data in the 'public domain' is up for grabs. Many think the law has not kept pace with technology, but technologists often underestimate the strength of conventional data protection laws and regulations. In particular, technology neutral privacy principles are largely blind to the methods of collection, and barely distinguish between directly and indirectly collected data. As a consequence, the extraction of Personal Information from raw data constitutes an act of collection and as such is subject to longstanding privacy statutes. Privacy laws such as that of Australia don't even use the words 'public' and 'private' to qualify the data flows concerned (Privacy Act 1988).

On the other hand, orthodox privacy policies and static data usage agreements do not cater for the way Personal Information can be synthesised tomorrow from raw data collected today. Privacy management must evolve to become more dynamic, instead of being preoccupied with unwieldy policy documents and simplistic technical notices about cookies.

Thus the fit between Big Data and data privacy standards is complex and sometimes surprising. While existing laws are not to be underestimated, there is a need for data privacy principles to be extended, to help individuals remain abreast of what's being done with information about them, and to foster transparency regarding the new ways for personal information to be generated.

Conclusion: Making Big Data privacy real

A Big Data dashboard like the one described could serve several parallel purposes in aid of progressive privacy principles. It could reveal dynamically to users what PII can be collected about them through Big Data; it could engage users in a fair and transparent exchange of value-for-PII transaction; and it could enable dynamic consent where users are able to opt in to Big Data processes, and opt out and in again, over time, as their understanding of the PII bargain evolves.

Big Data holds big promises, for the benefit of many. There are grand plans for population-wide electronic health records, new personalised financial services that leverage massive retail databases, and electricity grid management systems that draw on real-time consumption data from smart meters in homes, to extend the life of aging 'poles and wires' while reducing greenhouse gas emissions. The value to individuals and operators alike of these programs is amplified as computing power grows, new algorithms are researched, and more and more data sets are joined together. Likewise, the privacy risks are compounded. The potential value of Personal Information in the modern Big Data landscape cannot be represented in a static business model, and neither can the privacy pros and cons be captured in a fixed policy document. New user interfaces and visualisations like a 'Big Data dashboard' are needed to bring dynamic extensions to traditional privacy principles, and help people appreciate and intelligently negotiate the insights that can be extracted about them from the raw material that is data.

Posted in Privacy, Big Data

Schrodinger's Privacy: A Master Class

Master Class: How to Protect Your Customer's Digital Identity and Personal Data

A Social Media Week Sydney event #SMWSydney
Law Lounge, Sydney University Law School
New Law School Building
Eastern Ave, Camperdown
Fri, Sep 26 - 10:00 AM - 11:30 AM

How can you navigate privacy fact and fiction, without the geeks and lawyers boring each other to death?

It's often said that technology has outpaced privacy law. Many digital businesses seem empowered by this brash belief. And so they proceed with apparent impunity to collect and monetise as much Personal Information as they can get their hands on.

But it's a myth!

Some of the biggest corporations in the world, including Google and Facebook, have been forcefully brought to book by privacy regulations. So, we have to ask ourselves:

  • what does privacy law really mean for social media in Australia?
  • is privacy "good for business"?
  • is privacy "not a technology issue"?
  • how can digital businesses navigate fact & fiction, without their geeks and lawyers boring each other to death?

In this Social Media Week Master Class I will:

  • unpack what's "creepy" about certain online practices
  • show how to rate data privacy issues objectively
  • analyse classic misadventures with geolocation, facial recognition, and predicting when shoppers are pregnant
  • critique photo tagging and crowd-sourced surveillance
  • explain why Snapchat is worth more than three billion dollars
  • analyse the regulatory implications of Big Data, Biometrics, Wearables and The Internet of Things.

We couldn't have timed this Master Class better, coming two weeks after the announcement of the Apple Watch, which will figure prominently in the class!

So please come along, for a fun and in-depth a look at social media, digital technology, the law, and decency.

Register here.

About the presenter

Steve Wilson is a technologist, who stumbled into privacy 12 years ago. He rejected those well meaning slogans (like "Privacy Is Good For Business!") and instead dug into the relationships between information technology and information privacy. Now he researches and develops design patterns to help sort out privacy, alongside all the other competing requirements of security, cost, usability and revenue. His latest publications include:

  • "The collision between Big Data and privacy law" due out in October in the Australian Journal of Telecommunications and the Digital Economy.

Posted in Social Networking, Social Media, Privacy, Internet, Biometrics, Big Data

Privacy watch

Update 22 September 2014

Last week, Apple suddenly went from silent to expansive on privacy, and the thrust of my blog straight after the Apple Watch announcement is now wrong. Apple posted a letter from CEO Tim Cook at www.apple.com/privacy along with a document that sets outs how "We’ve built privacy into the things you use every day".

The paper is very interesting. It's a sophisticated and balanced account of policy, business strategy and technology elements that go to create privacy. Apple highlights that they:

  • forswear the exploitation of customer data
  • do not scan content or messages
  • do not let their small "iAd" business take data from other Apple departments
  • require certain privacy protective practices on the part of their health app developers.

They have also provided quite decent information about how Siri and health data is handled.

Apple's stated privacy posture is all about respect and self-restraint. Setting out these principles and commitments is a very welcome development indeed. I congratulate them.

Today Apple launched their much anticipated wrist watch, described by CEO Tim Cook as "the most personal device they have ever developed". He got that right!

Rather more than a watch, it's a sort of guardian angel. The Apple Watch has Siri built-in, along with new haptic sensors and buzzers, a heartbeat monitor, accelerometer, and naturally the GPS and Wi-Fi geolocation capability to track your speed and position throughout the day. So they say "Apple Watch is an all-day fitness tracker and a highly advanced sports watch in a single device".

Apple Watch

The Apple Watch will be a paragon of digital disruption. To understand and master disruption today requires the coordination of mobility, Big Data, the cloud and user interfaces. These cannot be treated as isolated technologies, so when a company like Apple controls them all, at scale, real transformation follows.

Thus Apple is one of the few businesses that can make promises like this: "Over time, Apple Watch gets to know you the way a good personal trainer would". In this we hear echoes of the smarts that power Siri, and we are reminded that amid the novel intimacy we have with these devices, many serious privacy problems have yet to be resolved.

The Apple Event today was a play in four acts:
Act I: the iPhone 6 release;
Act II: Apple Pay launch;
Act III: the Apple Watch announcement;
Act IV: U2 played live and released their new album free on iTunes!

It was fascinating to watch the thematic differences across these stanzas. With Apple Pay, they stressed security and privacy; we were told about the Secure Element, the way card numbers are replaced by random numbers (tokenization), and an architecture where Apple cannot see how much you spend nor where you spend it. On the other hand, when it came to the Apple Watch and its integrated health sensors, privacy wasn't mentioned, not at all. We are left to deduce that aggregating personal health data at Apple's servers is a part of a broader plan.

The cornerstones of data privacy include Collection Limitation, Use Limitation (or "Purpose Specification") and Openness. Custodians of our Personally Identifiable Information (PII) should refrain from collecting and retaining PII they don't really need; they should specify what they do with PII and restrict unrelated secondary usage; and they should tell people what they're doing, generally in a Privacy Policy. With Siri, Apple sadly fails all these tests.See Update 22 September 2014 above.

The Apple Privacy Policy is altogether silent on Siri. The document details the sorts of information collected through its overt business processes like registration, sales and support, but it says nothing about the voice recordings and transcripts of Siri communications. Neither does the Siri FAQ mention what is done with all that data. It's quite an omission, seeing that when you dictate an SMS or an email to Siri, Apple retains a copy of communications that are normally out of bounds for your telecomms carrier.

It's been left to journalists to try and find out what Apple does with the information it mines from Siri. Wired magazine discovered eventually that Apple retains masked Siri voice recordings for six months; it then purportedly de-identifies them and keeps them for a further 18 months, for research. Yet even these explanations don't touch on the extracted contents of the communications, nor the metadata, like the trends and correlations that go to Siri's learning. If the purpose of Siri is ostensibly to automate the operation of the iPhone and its apps, then Apple should be refrain from using the by-products of Siri's voice processing for anything else. But we just don't know what they do, and Apple imposes no self-restraint.See Update 22 September 2014 above.

We should hope for radically greater transparency with the Apple Watch and its health apps. Most of the watch's data processing and analytics will be carried out in the cloud. So Apple will come to hold detailed records of its users' exercise regimes, their performance figures, trend data and correlations. These are health records. Inevitably, health applications will take in other medical data, like food diaries entered by users, statistics imported from other databases, and detailed measurements from Internet-connected scales, blood pressure monitors and even medical devices. Apple will see what we're doing to improve our health, day by day, year on year. They will come to know more about what's making us healthy and what's not than we do ourselves.

Apple Watch Activity App

Now, the potential benefits from this sort of personal technology to self-managed care and preventative medicine are enormous. But so are the data management and privacy obligations.

Within the US, Apple will doubtless be taking steps to avoid falling under the stringent HIPAA regulations, yet in the rest of the world, a more subtle but far-reaching problem looms. Many broad based data privacy regimes forbid the collection of health information without consent. And the laws of the European Union, Australia, New Zealand and elsewhere are generally technology neutral. This means that data collected directly from patients or doctors, and fresh data collected by way of automated algorithms are treated essentially the same way. So when a sophisticated health management app running in the cloud somewhere mines all that exercise and lifestyle data, and starts to make inferences about health and wellbeing, great care needs to be taken that the indiviuals concerned know what's going on in advance, and have given their informed consent.

One of the deep privacy challenges in Big Data is that data miners don't know what they're going to find. Even with the best will in the world, a company can struggle to say in its Privacy Policy what PII is expects to extract (and thus collect) in future from the raw data it collects today. At Constellation Research we've been fleshing out a new sort of compact between businesses and individuals that seeks to keep users abreast of developments in data analytics, and promises to provide people with proper control of personal Big Data results.

It ought to be possible to expressly opt in to Big Data processes when you can understand the pros and cons and the net benefits, and to later opt out, and opt back in again, as the benefit equation shifts over time. But even visualising the products of Big Data is hard; I believe graphical user interfaces (GUIs) to allow people to comprehend and actively control the process will be one of the great software design problems of our age.

Apple are obviously preeminent in GUI and user experience innovation. You would think if anyone can create the novel yet intuitive interfaces desperately needed to control Big Data PII, Apple can. But first they will have to embrace their responsibilities for the increasingly intimate details they are helping themselves to. If the Apple Watch is "the most personal device they've ever designed" then let's see privacy and data protection commitments to match.

Posted in Privacy, e-health, Constellation Research, Cloud, Big Data

Postcard from Monterey #CISmcc

First Day Reflections from CIS Monterey.

Follow along on Twitter at #CISmcc (for the Monterey Conference Centre).

The Cloud Identity Summit really is the top event on the identity calendar. The calibre of the speakers, the relevance and currency of the material, the depth and breadth of the cohort, and the international spread are all unsurpassed. It's been great to meet old cyber-friends in "XYZ Space" at last -- like Emma Lindley from the UK and Lance Peterman. And to catch up with such talented folks like Steffen Sorensen from New Zealand once again.

A day or two before, Ian Glazer of Salesforce asked in a tweet what we were expecting to get out of CIS. And I replied that I hoped to change my mind about something. It's unnerving to have your understanding and assumptions challenged by the best in the field ... OK, sometimes it's outright embarrassing ... but that's what these events are all about. A very wise lawyer said to me once, around 1999 at the dawn of e-commerce, that he had changed his mind about authentication a few times up to that point, and that he fully expected to change his mind again and again.

I spent most of Saturday in Open Identity Foundation workshops. OIDF chair Don Thibeau enthusiastically stressed two new(ish) initiatives: Mobile Connect in conjunction with the mobile carrier trade association GSM Association @GSMA, and HIE Connect for the health sector. For the uninitiated, HIE means Health Information Exchange, namely a hub for sharing structured e-health records among hospitals, doctors, pharmacists, labs, e-health records services, allied health providers, insurers, drug & device companies, researchers and carers; for the initiated, we know there is some language somewhere in which the letters H.I.E. stand for "Not My Lifetime".

But seriously, one of the best (and pleasantly surprising) things about HIE Connect as the OIDF folks tell it, is the way its leaders unflinchingly take for granted the importance of privacy in the exchange of patient health records. Because honestly, privacy is not a given in e-health. There are champions on the new frontiers like genomics that actually say privacy may not be in the interests of the patients (or more's the point, the genomics businesses). And too many engineers in my opinion still struggle with privacy as something they can effect. So it's great -- and believe me, really not obvious -- to hear the HIE Connects folks -- including Debbie Bucci from the US Dept of Health and Human Services, and Justin Richer of Mitre and MIT -- dealing with it head-on. There is a compelling fit for the OAUTH and OIDC protocols here, with their ability to manage discrete pieces of information about users (patients) and to permission them all separately. Having said that, Don and I agree that e-health records permissioning and consent is one of the great UI/UX challenges of our time.

Justin also highlighted that the RESTful patterns emerging for fine-grained permissions management in healthcare are not confined to healthcare. Debbie added that the ability to query rare events without undoing privacy is also going to be a core defining challenge in the Internet of Things.

MyPOV: We may well see tremendous use cases for the fruits of HIE Exchange before they're adopted in healthcare!

In the afternoon, we heard from Canadian and British projects that have been working with the Open Identity Exchange (OIX) program now for a few years each.

Emma Lindley presented the work they've done in the UK Identity Assurance Program (IDAP) with social security entitlements recipients. These are not always the first types of users we think of for sophisticated IDAM functions, but in Britain, local councils see enormous efficiency dividends from speeding up the issuance of eg disabled parking permits, not to mention reducing imposters, which cost money and lead to so much resentment of the well deserved. Emma said one Attributes Exchange beta project reduced the time taken to get a 'Blue Badge' permit from 10 days to 10 minutes. She went on to describe the new "Digital Sources of Trust" initiative which promises to reconnect under-banked and under-documented sections of society with mainstream financial services. Emma told me the much-abused word "transformational" really does apply here.

MyPOV: The Digital Divide is an important issue for me, and I love to see leading edge IDAM technologies and business processes being used to do something about it -- and relatively quickly.

Then Andre Boysen of SecureKey led a discussion of the Canadian identity ecosystem, which he said has stabilised nicely around four players: Federal Government, Provincial Govt, Banks and Carriers. Lots of operations and infrastructure precedents from the payments industry have carried over.
Andre calls the smart driver license of British Columbia the convergence of "street identity and digital identity".

MyPOV: That's great news - and yet comparable jurisdictions like Australia and the USA still struggle to join governments and banks and carriers in an effective identity synthesis without creating great privacy and commercial anxieties. All three cultures are similarly allergic to identity cards, but only in Canada have they managed to supplement drivers licenses with digital identities with relatively high community acceptance. In nearly a decade, Australia has been at a standstill in its national understanding of smartcards and privacy.

For mine, the CIS Quote of the Day came from Scott Rice of the Open ID Foundation. We all know the stark problem in our industry of the under-representation of Relying Parties in the grand federated identity projects. IdPs and carriers so dominate IDAM. Scott asked us to imagine a situation where "The auto industry was driven by steel makers". Governments wouldn't put up with that for long.

Can someone give us the figures? I wonder if Identity and Access Management is already more economically ore important than cars?!

Cheers from Monterey, Day 1.

Posted in Smartcards, Security, Identity, Federated Identity, e-health, Cloud, Biometrics, Big Data

Webinar: Big Privacy

I'm presenting a Constellation Research webinar next week on my latest research into "Big Privacy" (June 18th in the US / June 19th in Australia). I hope you can join us.

Register here.

We live in an age where billionaires are self-made on the back of the most intangible of assets – the information they have amassed about us. That information used to be volunteered in forms and questionnaires and contracts but increasingly personal information is being observed and inferred.

The modern world is awash with data. It’s a new and infinitely re-usable raw material. Most of the raw data about us is an invisible by-product of our mundane digital lives, left behind by the gigabyte by ordinary people who do not perceive it let alone understand it.

Many Big Data and digital businesses proceed on the basis that all this raw data is up for grabs. There is a particular widespread assumption that data in the "public domain" is free-for-all, and if you’re clever enough to grab it, then you’re entitled to extract whatever you can from it.

In the webinar, I'll try to show how some of these assumptions are naive. The public is increasingly alarmed about Big Data and averse to unbridled data mining. Excessive data mining isn't just subjectively 'creepy'; it can be objectively unlawful in many parts of the world. Conventional data protection laws turn out to be surprisingly powerful in in the face of Big Data. Data miners ignore international privacy laws at their peril!

Today there are all sorts of initiatives trying to forge a new technology-privacy synthesis. They go by names like "Privacy Engineering" and "Privacy by Design". These are well meaning efforts but they can be a bit stilted. They typically overlook the strengths of conventional privacy law, and they can miss an opportunity to engage the engineering mind.

It’s not politically correct but I believe we must admit that privacy is full of contradictions and competing interests. We need to be more mature about privacy. Just as there is no such thing as perfect security, there can never be perfect privacy either. And is where the professional engineering mindset should be brought in, to help deal with conflicting requirements.

If we’re serious about Privacy by Design and Privacy Engineering then we need to acknowledge the tensions. That’s some of the thinking behind Constellation's new Big Privacy compact. To balance privacy and Big Data, we need to hold a conversation with users that respects the stresses and strains, and involves them in working through the new privacy deal.

The webinar will cover these highlights of the Big Privacy pact:

    • Respect and Restraint
    • Super transparency
    • And a fair deal for Personal Information.

Have a disruptive technology implementation story? Get recognised for your leadership. Apply for the 2014 SuperNova Awards for leaders in disruptive technology.

Posted in Social Media, Privacy, Constellation Research, Biometrics, Big Data

Watson the Doctor is no laughing matter

For the past year, oncologists at the Memorial Sloan Kettering Cancer Centre in New York have been training IBM’s Watson – the artificial intelligence tour-de-force that beat allcomers on Jeopardy – to help personalise cancer care. The Centre explains that "combining [their] expertise with the analytical speed of IBM Watson, the tool has the potential to transform how doctors provide individualized cancer treatment plans and to help improve patient outcomes". Others are speculating already that Watson could "soon be the best doctor in the world".

I have no doubt that when Watson and things like it are available online to doctors worldwide, we will see overall improvements in healthcare outcomes, especially in parts of the world now under-serviced by medical specialists [having said that, the value of diagnosing cancer in poor developing nations is questionable if they cannot go on to treat it]. As with Google's self-driving car, we will probably get significant gains eventually, averaged across the population, from replacing humans with machines. Yet some of the foibles of computing are not well known and I think they will lead to surprises.

For all the wondrous gains made in Artificial Intelligence, where Watson now is the state-of-the art, A.I. remains algorithmic, and for that, it has inherent limitations that don't get enough attention. Computer scientists and mathematicians have know for generations that some surprisingly straightforward problems have no algorithmic solution. That is, some tasks cannot be accomplished by any universal step-by-step codified procedure. Examples include the Halting Problem and the Travelling Salesperson Problem. If these simple challenges have no algorithm, we need be more sober in our expectations of computerised intelligence.

A key limitation of any programmed algorithm is that it must make its decisions using a fixed set of inputs that are known and fully characterised (by the programmer) at design time. If you spring an unexpected input on any computer, it can fail, and yet that's what life is all about -- surprises. No mathematician seriously claims that what humans do is somehow magic; most believe we are computers made of meat. Nevertheless, when paradoxes like the Halting Problem abound, we can be sure that computing and cognition are not what they seem. We should hope these conundrums are better understood before putting too much faith in computers doing deep human work.

And yet, predictably, futurists are jumping ahead to imagine "Watson apps" in which patients access the supercomputer for themselves. Even if there were reliable algorithms for doctoring, I reckon the "Watson app" is a giant step, because of the complex way the patient's conditions are assessed and data is gathered for the diagnosis. That is, the taking of the medical history.

In these days of billion dollar investments in electronic health records (EHRs), we tend to think that medical decisions are all about the data. When politicians announce EHR programs they often boast that patients won't have to go through the rigmarole of giving their history over and over again to multiple doctors as they move through an episode of care. This is actually a serious misunderstanding of the importance in clinical decision-making of the interaction between medico and patient when the history is taken. It's subtle. The things a patient chooses to tell, the things they seem to be hiding, and the questions that make them anxious, all guide an experienced medico when taking a history, and provide extra cues (metadata if you will) about the patient’s condition.

Now, Watson may well have the ability to navigate this complexity and conduct a very sophisticated Q&A. It will certainly have a vastly bigger and more reliable memory of cases than any doctor, and with that it can steer a dynamic patient questionnaire. But will Watson be good enough to be made available direct to patients through an app, with no expert human mediation? Or will a host of new input errors result from patients typing their answers into a smart phone or speaking into a microphone, without any face-to-face subtlety (let alone human warmth)? It was true of mainframes and it’s just as true of the best A.I.: Bulldust in, bulldust out.

Finally, Watson's existing linguistic limitations are not to be underestimated. It is surely not trivial that Watson struggles with puns and humour. Futurist Mark Pesce when discussing Watson remarked in passing that scientists don’t understand the "quirks of language and intelligence" that create humour. The question of what makes us laugh does in fact occupy some of the finest minds in cognitive and social science. So we are a long way from being able to mechanise humour. And this matters because for the foreseeable future, it puts a great deal of social intercourse beyond AI's reach.

In between the extremes of laugh-out-loud comedy and a doctor’s dry written notes lies a spectrum of expressive subtleties, like a blush, an uncomfortable laugh, shame, and the humiliation that goes with some patients’ lived experience of illness. Watson may understand the English language, but does it understand people?

Watson can answer questions, but good doctors ask a lot of questions too. When will this amazing computer be able to hold the sort of two-way conversation that we would call a decent "bedside manner"?

Have a disruptive technology implementation story? Get recognised for your leadership. Apply for the 2014 SuperNova Awards for leaders in disruptive technology.

Posted in Software engineering, Science, Language, e-health, Culture, Big Data

Three billion was a Snap

The latest Snowden revelations include the NSA's special programs for extracting photos and identifying from the Internet. Amongst other things the NSA uses their vast information resources to correlate location cues in photos -- buildings, streets and so on -- with satellite data, to work out where people are. They even search especially for passport photos, because these are better fodder for facial recognition algorithms. The audacity of these government surveillance activities continues to surprise us, and their secrecy is abhorrent.

Yet an ever greater scale of private sector surveillance has been going on for years in social media. With great pride, Facebook recently revealed its R&D in facial recognition. They showcased the brazenly named "DeepFace" biometric algorithm, which is claimed to be 97% accurate in recognising faces from regular images. Facebook has made a swaggering big investment in biometrics.

Data mining needs raw material, there's lots of it out there, and Facebook has been supremely clever at attracting it. It's been suggested that 20% of all photos now taken end up in Facebook. Even three years ago, Facebook held 10,000 times as many photographs as the Library of Congress:

Largest photo libraries
[Picture courtesy of the now retired 1000memories.com blog]

And Facebook will spend big buying other photo lodes. Last year they tried to buy Snapchat for the spectacular sum of three billion dollars. The figure had pundits reeling. How could a start-up company with 30 people be worth so much? All the usual dot com comparisons were made; the offer seemed a flight of fancy.

But no, the offer was a rational consideration for the precious raw material that lies buried in photo data.

Snapchat generates at least 100 million new images every day. Three billion dollars was, pardon me, a snap. I figure that at a ballpark internal rate of return of 10%, a $3B investment is equivalent to $300M p.a. so even if the Snapchat volume stopped growing, Facebook would have been paying one cent for every new snap, in perpetuity.

These days, we have learned from Snowden and the NSA that communications metadata is just as valuable as the content of our emails and phone calls. So remember that it's the same with photos. Each digital photo comes from a device that embeds within the image metadata usually including the time and place of when the picture was taken. And of course each Instagram or Snapchat is a social post, sent by an account holder with a history and rich context in which the image yields intimate real time information about what they're doing, when and where.

The hallmark of the Snapchat service is transience: all those snaps are supposed to flit from one screen to another before vaporising. Now of course that idea is contestable; enthusiasts worked out pretty quickly how to retrieve snaps from old memory. And in any case, transience is a red herring, perhaps a deliberate distraction, because the metadata matters more, and Snapchat admits in its Privacy Policy that it pretty well keeps the lot:

  • When you access or use our Services, we automatically collect information about you, including:
  • Usage Information: When you send or receive messages via our Services, we collect information about these messages, including the time, date, sender and recipient of the Snap. We also collect information about the number of messages sent and received between you and your friends and which friends you exchange messages with most frequently.
  • Log Information: We log information about your use of our websites, including your browser type and language, access times, pages viewed, your IP address and the website you visited before navigating to our websites.
  • Device Information: We may collect information about the computer or device you use to access our Services, including the hardware model, operating system and version, MAC address, unique device identifier, phone number, International Mobile Equipment Identity ("IMEI") and mobile network information. In addition, the Services may access your device's native phone book and image storage applications, with your consent, to facilitate your use of certain features of the Services.
  • Location Information: With your consent, we may collect information about the location of your device to facilitate your use of certain features of our Services, determine the speed at which your device is traveling, add location-based filters to your Snaps (such as local weather), and for any other purpose described in this privacy policy.

Snapchat goes on to declare it may use any of this information to "personalize and improve the Services and provide advertisements, content or features that match user profiles or interests" and it reserves the right to share any information with "vendors, consultants and other service providers who need access to such information to carry out work on our behalf".

So back to the data mining: nothing stops Snapchat -- or a new parent company -- running biometric facial recognition over the snaps as they pass through the servers, to extract additional "profile" information. And there's an extra kicker that makes Snapchats extra valuable for biometric data miners. The vast majority of Snapchats are selfies. So if you extract a biometric template from a snap, you already know who it belongs to, without anyone having to tag it. Snapchat would provide a hundred million auto-calibrations every day for facial recognition algorithms! On Facebook, the privacy aware turn off photo tagging, but with Snapchats, self identification is inherent to the experience and is unlikely to be ever be disabled.

NSA has all your selfies

As I've discussed before, the morbid thrill of Snowden's spying revelations has tended to overshadow his sober observations that when surveillance by the state is probably inevitable, we need to be discussing accountability.

While we're all ventilating about the NSA, it's time we also attended to private sector spying and properly debated the restraints that may be appropriate on corporate exploitation of social data.

Personally I'm much more worried that an infomopoly has all my selfies.

Have a disruptive technology implementation story? Get recognised for your leadership. Apply for the 2014 SuperNova Awards for leaders in disruptive technology.

Posted in Social Networking, Social Media, Privacy, Biometrics, Big Data

Big Privacy: a new Constellation Research Report

I've just completed a major new Constellation Research report looking at how today's privacy practices cope with Big Data. The report draws together my longstanding research on the counter-intuitive strengths of technology-neutral data protection laws, and melds it with my new Constellation colleagues' vast body of work in data analytics. The synergy is honestly exciting and illuminating.

Big Data promises tremendous benefits for a great many stakeholders but the potential gains are jeopardised by the excesses of a few. Some cavalier online businesses are propelled by a naive assumption that data in the "public domain" is up for grabs, and with that they often cross a line.

For example, there are apps and services now that will try to identify pictures you take of strangers in public, by matching them biometrically against data supersets compiled from social networking sites and other publically accessible databases. Many find such offerings quite creepy but they may be at a loss as to what to do about it, or even how to think through the issues objectively. Yet the very metaphor of data mining holds some of the clues. If, as some say, raw data is like crude oil, just waiting to be mined and exploited by enterprising prospecters, then surely there are limits, akin to mining permits?

Many think the law has not kept pace with technology, and that digital innovators are free to do what they like with any data they can get their hands on. But technologists repreatedly underestimate the strength of conventional data protection laws and regulations. The extraction of PII from raw data may be interpreted under technology neutral privacy principles as an act of Collection and as such is subject to existing statutes. Around the world, Google thus found they are not actually allowed to gather Personal Data that happens to be available in unencrypted Wi-Fi transmission as StreetView cars drive by homes and offices. And Facebook found they are not actually allowed to automatically identify people in photos through face recognition without consent. And Target probably would find, if they tried it outside the USA, that they cannot flag selected female customers as possibly pregnant by analysing their buying habits.

On the other hand, orthodox privacy policies and static user agreements do not cater for the way personal data can be conjured tomorrow from raw data collected today. Traditional privacy regimes require businesses to spell out what personally identifiable information (PII) they collect and why, and to restrict secondary usage. Yet with Big Data, with the best will in the world, a company might not know what data analytics will yield down the track. If mutual benefits for business and customer alike might be uncovered, a freeze-frame privacy arrangement may be counter-productive.

Thus the fit between data analytics and data privacy standards is complex and sometimes surprising. While existing laws are not to be underestimated, we do need something new. As far as I know it was Ray Wang in his Harvard Business Review blog who first called for a fresh privacy compact amongst users and businesses.

The spirit of data privacy is simply framed: organisations that know us should respect the knowledge they have, they should be open about what they know, and they should be restrained in what they do with it. In the Age of Big Data, let's have businesses respect the intelligence they extract from data mining, just as they should respect the knowledge they collect directly through forms and questionnaires.

I like the label "Big Privacy"; it is grandly optimistic, like "Big Data" itself, and at the same time implies a challenge to do better than regular privacy practices.

Ontario Privacy Commissioner Dr Ann Cavoukian writes about Big Privacy, describing it simply as "Privacy By Design writ large". But I think there must be more to it than that. Big Data is quantitatively but also qualitatively different from ordinary data analyis.

To summarise the basic elements of a Big Data compact:

  • Respect and Restraint: In the face of Big Data’s temptations, remember that privacy is not only about what we do with PII; just as important is what we choose not to do.
  • Super transparency: Who knows what lies ahead in Big Data? If data privacy means being open about what PII is collected and why, then advanced privacy means going further, telling people more about the business models and the sorts of results data mining is expected to return.
  • Engage customers in a fair deal for PII: Information businesses ought to set out what PII is really worth to them (especially when it is extracted in non-obvious ways from raw data) and offer a fair "price" for it, whether in the form of "free" products and services, or explicit payment.
  • Really innovate in privacy: There’s a common refrain that “privacy hampers innovation” but often that's an intellectually lazy cover for reserving the right to strip-mine PII. Real innovation lies in business practices which create and leverage PII while honoring privacy principles.

My report, "Big Privacy" Rises to the Challenges of Big Data may be downloaded from the Constellation Research website.

Posted in Social Media, Privacy, Constellation Research, Big Data

The strengths and weaknesses of Data Privacy in the Age of Big Data

This is the abstract of a current privacy conference proposal.

Synopsis

Many Big Data and online businesses proceed on a naive assumption that data in the "public domain" is up for grabs; technocrats are often surprised that conventional data protection laws can be interpreted to cover the extraction of PII from raw data. On the other hand, orthodox privacy frameworks don't cater for the way PII can be created in future from raw data collected today. This presentation will bridge the conceptual gap between data analytics and privacy, and offer new dynamic consent models to civilize the trade in PII for goods and services.

Abstract

It’s often said that technology has outpaced privacy law, yet by and large that's just not the case. Technology has certainly outpaced decency, with Big Data and biometrics in particular becoming increasingly invasive. However OECD data privacy principles set out over thirty years ago still serve us well. Outside the US, rights-based privacy law has proven effective against today's technocrats' most worrying business practices, based as they are on taking liberties with any data that comes their way. To borrow from Niels Bohr, technologists who are not surprised by data privacy have probably not understood it.

The cornerstone of data privacy in most places is the Collection Limitation principle, which holds that organizations should not collect Personally Identifiable Information beyond their express needs. It is the conceptual cousin of security's core Need-to-Know Principle, and the best starting point for Privacy-by-Design. The Collection Limitation principle is technology neutral and thus blind to the manner of collection. Whether PII is collected directly by questionnaire or indirectly via biometric facial recognition or data mining, data privacy laws apply.

It’s not for nothing we refer to "data mining". But few of those unlicensed data gold diggers seem to understand that the synthesis of fresh PII from raw data (including the identification of anonymous records like photos) is merely another form of collection. The real challenge in Big Data is that we don’t know what we’ll find as we refine the techniques. With the best will in the world, it is hard to disclose in a conventional Privacy Policy what PII might be collected through synthesis down the track. The age of Big Data demands a new privacy compact between organisations and individuals. High minded organizations will promise to keep people abreast of new collections and will offer ways to opt in, and out and in again, as the PII-for-service bargain continues to evolve.

Posted in Social Networking, Privacy, Biometrics, Big Data