Acknowledgement: Daniel Barth-Jones kindly engaged with me after this blog was initially published, and pointed out several significant factual errors, for which I am grateful.
In 2014, the New York Taxi & Limousine Company (TLC) released a large "anonymised" dataset containing 173 million taxi rides taken in 2013. Soon after, software developer Vijay Pandurangan managed to undo the hashed taxi registration numbers. Subsequently, privacy researcher Anthony Tockar went on to combine public photos of celebrities getting in or out of cabs, to recreate their trips. See Anna Johnston's analysis here.
This re-identification demonstration has been used by some to bolster a general claim that anonymity online is increasingly impossible.
On the other hand, medical research advocates like Columbia University epidemiologist Daniel Barth-Jones argue that the practice of de-identification can be robust and should not be dismissed as impractical on the basis of demonstrations such as this. The identifiability of celebrities in these sorts of datasets is a statistical anomaly reasons Barth-Jones and should not be used to frighten regular people out of participating in medical research on anonymised data. He wrote in a blog that:
- "However, it would hopefully be clear that examining a miniscule proportion of cases from a population of 173 million rides couldn’t possibly form any meaningful basis of evidence for broad assertions about the risks that taxi-riders might face from such a data release (at least with the taxi medallion/license data removed as will now be the practice for FOIL request data)."
As a health researcher, Barth-Jones is understandably worried that re-identification of small proportions of special cases is being used to exaggerate the risks to ordinary people. He says that the HIPAA de-identification protocols if properly applied leave no significant risk of re-id. But even if that's the case, HIPAA processes are not applied to data across the board. The TLC data was described as "de-identified" and the fact that any people at all (even stand-out celebrities) could be re-identified from data does create a broad basis for concern - "de-identified" is not what it seems. Barth-Jones stresses that in the TLC case, the de-identification was fatally flawed [technically: it's no use hashing data like registration numbers with limited value ranges because the hashed values can be reversed by brute force] but my point is this: who among us who can tell the difference between poorly de-identified and "properly" de-identified?
And how long can "properly de-identified" last? What does it mean to say casually that only a "minuscule proportion" of data can be re-identified? In this case, the re-identification of celebrities was helped by the fact lots of photos of them are readily available on social media, yet there are so many photos in the public domain now, regular people are going to get easier to be identified.
But my purpose here is not to play what-if games, and I know Daniel advocates statistically rigorous measures of identifiability. We agree on that -- in fact, over the years, we have agreed on most things. The point I am trying to make in this blog post is that, just as nobody should exaggerate the risk of re-identification, nor should anyone play it down. Claims of de-identification are made almost daily for really crucial datasets, like compulsorily retained metadata, public health data, biometric templates, social media activity used for advertising, and web searches. Some of these claims are made with statistical rigor, using formal standards like the HIPAA protocols; but other times the claim is casual, made with no qualification, with the aim of comforting end users.
"De-identified" is a helluva promise to make, with far-reaching ramifications. Daniel says de-identification researchers use the term with caution, knowing there are technical qualifications around the finite probability of individuals remaining identifiable. But my position is that the fine print doesn't translate to the general public who only hear that a database is "anonymous". So I am afraid the term "de-identified" is meaningless outside academia, and in casual use is misleading.
Barth-Jones objects to the conclusion that "it's virtually impossible to anonymise large data sets" but in an absolute sense, that claim is surely true. If any proportion of people in a dataset may be identified, then that data set is plainly not "anonymous". Moreover, as statistics and mathematical techniques (like facial recognition) improve, and as more ancillary datasets (like social media photos) become accessible, the proportion of individuals who may be re-identified will keep going up.[Readers who wish to pursue these matters further should look at the recent Harvard Law School online symposium on "Re-identification Demonstrations", hosted by Michelle Meyer, in which Daniel Barth-Jones and I participated, among many others.]
Both sides of this vexed debate need more nuance. Privacy advocates have no wish to quell medical research per se, nor do they call for absolute privacy guarantees, but we do seek full disclosure of the risks, so that the cost-benefit equation is understood by all. One of the obvious lessons in all this is that "anonymous" or "de-identified" on their own are poor descriptions. We need tools that meaningfully describe the probability of re-identification. If statisticians and medical researchers take "de-identified" to mean "there is an acceptably small probability, namely X percent, of identification" then let's have that fine print. Absent the detail, lay people can be forgiven for thinking re-identification isn't going to happen. Period.
And we need policy and regulatory mechanisms to curb inappropriate re-identification. Anonymity is a brittle, essentially temporary, and inadequate privacy tool.
I argue that the act of re-identification ought to be treated as an act of Algorithmic Collection of PII, and regulated as just another type of collection, albeit an indirect one. If a statistical process results in a person's name being added to a hitherto anonymous record in a database, it is as if the data custodian went to a third party and asked them "do you know the name of the person this record is about?". The fact that the data custodian was clever enough to avoid having to ask anyone about the identity of people in the re-identified dataset does not alter the privacy responsibilities arising. If the effect of an action is to convert anonymous data into personally identifiable information (PII), then that action collects PII. And in most places around the world, any collection of PII automatically falls under privacy regulations.
It looks like we will never guarantee anonymity, but the good news is that for privacy, we don't actually need to. Privacy is the protection you need when you affairs are not anonymous, for privacy is a regulated state where organisations that have knowledge about you are restrained in what they do with it. Equally, the ability to de-anonymise should be restricted in accordance with orthodox privacy regulations. If a party chooses to re-identify people in an ostensibly de-identified dataset, without a good reason and without consent, then that party may be in breach of data privacy laws, just as they would be if they collected the same PII by conventional means like questionnaires or surveillance.
Surely we can all agree that re-identification demonstrations serve to shine a light on the comforting claims made by governments for instance that certain citizen datasets can be anonymised. In Australia, the government is now implementing telecommunications metadata retention laws, in the interests of national security; the metadata we are told is de-identified and "secure". In the UK, the National Health Service plans to make de-identified patient data available to researchers. Whatever the merits of data mining in diverse fields like law enforcement and medical research, my point is that any government's claims of anonymisation must be treated critically (if not skeptically), and subjected to strenuous and ongoing privacy impact assessment.
Privacy, like security, can never be perfect. Privacy advocates must avoid giving the impression that they seek unrealistic guarantees of anonymity. There must be more to privacy than identity obscuration (to use a more technically correct term than "de-identification"). Medical research should proceed on the basis of reasonable risks being taken in return for beneficial outcomes, with strong sanctions against abuses including unwarranted re-identification. And then there wouldn't need to be a moral panic over re-identification if and when it does occur, because anonymity, while highly desirable, is not essential for privacy in any case.
Update 22 September 2014
Last week, Apple suddenly went from silent to expansive on privacy, and the thrust of my blog straight after the Apple Watch announcement is now wrong. Apple posted a letter from CEO Tim Cook at www.apple.com/privacy along with a document that sets outs how "We’ve built privacy into the things you use every day".
The paper is very interesting. It's a sophisticated and balanced account of policy, business strategy and technology elements that go to create privacy. Apple highlights that they:
- forswear the exploitation of customer data
- do not scan content or messages
- do not let their small "iAd" business take data from other Apple departments
- require certain privacy protective practices on the part of their health app developers.
They have also provided quite decent information about how Siri and health data is handled.
Apple's stated privacy posture is all about respect and self-restraint. Setting out these principles and commitments is a very welcome development indeed. I congratulate them.
Today Apple launched their much anticipated wrist watch, described by CEO Tim Cook as "the most personal device they have ever developed". He got that right!
Rather more than a watch, it's a sort of guardian angel. The Apple Watch has Siri built-in, along with new haptic sensors and buzzers, a heartbeat monitor, accelerometer, and naturally the GPS and Wi-Fi geolocation capability to track your speed and position throughout the day. So they say "Apple Watch is an all-day fitness tracker and a highly advanced sports watch in a single device".
The Apple Watch will be a paragon of digital disruption. To understand and master disruption today requires the coordination of mobility, Big Data, the cloud and user interfaces. These cannot be treated as isolated technologies, so when a company like Apple controls them all, at scale, real transformation follows.
Thus Apple is one of the few businesses that can make promises like this: "Over time, Apple Watch gets to know you the way a good personal trainer would". In this we hear echoes of the smarts that power Siri, and we are reminded that amid the novel intimacy we have with these devices, many serious privacy problems have yet to be resolved.
The Apple Event today was a play in four acts:
Act I: the iPhone 6 release;
Act II: Apple Pay launch;
Act III: the Apple Watch announcement;
Act IV: U2 played live and released their new album free on iTunes!
It was fascinating to watch the thematic differences across these stanzas. With Apple Pay, they stressed security and privacy; we were told about the Secure Element, the way card numbers are replaced by random numbers (tokenization), and an architecture where Apple cannot see how much you spend nor where you spend it. On the other hand, when it came to the Apple Watch and its integrated health sensors, privacy wasn't mentioned, not at all. We are left to deduce that aggregating personal health data at Apple's servers is a part of a broader plan.
With Siri, Apple sadly fails all these tests.See Update 22 September 2014 above.
It's been left to journalists to try and find out what Apple does with the information it mines from Siri. Wired magazine discovered eventually that Apple retains masked Siri voice recordings for six months; it then purportedly de-identifies them and keeps them for a further 18 months, for research. Yet even these explanations don't touch on the extracted contents of the communications, nor the metadata, like the trends and correlations that go to Siri's learning. If the purpose of Siri is ostensibly to automate the operation of the iPhone and its apps, then Apple should be refrain from using the by-products of Siri's voice processing for anything else.
But we just don't know what they do, and Apple imposes no self-restraint.See Update 22 September 2014 above.
We should hope for radically greater transparency with the Apple Watch and its health apps. Most of the watch's data processing and analytics will be carried out in the cloud. So Apple will come to hold detailed records of its users' exercise regimes, their performance figures, trend data and correlations. These are health records. Inevitably, health applications will take in other medical data, like food diaries entered by users, statistics imported from other databases, and detailed measurements from Internet-connected scales, blood pressure monitors and even medical devices. Apple will see what we're doing to improve our health, day by day, year on year. They will come to know more about what's making us healthy and what's not than we do ourselves.
Now, the potential benefits from this sort of personal technology to self-managed care and preventative medicine are enormous. But so are the data management and privacy obligations.
Within the US, Apple will doubtless be taking steps to avoid falling under the stringent HIPAA regulations, yet in the rest of the world, a more subtle but far-reaching problem looms. Many broad based data privacy regimes forbid the collection of health information without consent. And the laws of the European Union, Australia, New Zealand and elsewhere are generally technology neutral. This means that data collected directly from patients or doctors, and fresh data collected by way of automated algorithms are treated essentially the same way. So when a sophisticated health management app running in the cloud somewhere mines all that exercise and lifestyle data, and starts to make inferences about health and wellbeing, great care needs to be taken that the indiviuals concerned know what's going on in advance, and have given their informed consent.
It ought to be possible to expressly opt in to Big Data processes when you can understand the pros and cons and the net benefits, and to later opt out, and opt back in again, as the benefit equation shifts over time. But even visualising the products of Big Data is hard; I believe graphical user interfaces (GUIs) to allow people to comprehend and actively control the process will be one of the great software design problems of our age.
Apple are obviously preeminent in GUI and user experience innovation. You would think if anyone can create the novel yet intuitive interfaces desperately needed to control Big Data PII, Apple can. But first they will have to embrace their responsibilities for the increasingly intimate details they are helping themselves to. If the Apple Watch is "the most personal device they've ever designed" then let's see privacy and data protection commitments to match.
Second Day Reflections from CIS Monterey.
Follow along on Twitter at #CISmcc (for the Monterey Conference Centre).
The Attributes push
At CIS 2013 in Napa a year ago, several of us sensed a critical shift in focus amongst the identerati - from identity to attributes. OIX launched the Attributes Exchange Network (AXN) architecture, important commentators like Andrew Nash were saying, 'hey, attributes are more interesting than identity', and my own #CISnapa talk went so far as to argue we should forget about identity altogether. There was a change in the air, but still, it was all pretty theoretical.
Twelve months on, and the Attributes push has become entirely practical. If there was a Word Cloud for the NSTIC session, my hunch is that "attributes" would dominate over "identity". Several live NSTIC pilots are all about the Attributes.
ID.me is a new company started by US military veterans, with the aim of improving access for the veterans community to discounted goods and services and other entitlements. Founders Matt Thompson and Blake Hall are not identerati -- they're entirely focused on improving online access for their constituents to a big and growing range of retailers and services, and offer a choice of credentials for proving veterans bona fides. It's central to the ID.me model that users reveal as little as possible about their personal identities, while having their veterans' status and entitlements established securely and privately.
Another NSTIC pilot Relying Party is the financial service sector infrastructure provider Broadridge. Adrian Chernoff, VP for Digital Strategy, gave a compelling account of the need to change business models to take maximum advantage of digital identity. Broadridge recently announced a JV with Pitney Bowes called Inlet, which will enable the secure sharing of discrete and validated attributes - like name, address and social security number - in an NSTIC compliant architecture.
Yesterday I said in my #CISmcc diary that I hoped to change my mind about something here, and half way through Day 2, I was delighted it was already happening. I've got a new attitude about NSTIC.
Over the past six months, I had come to fear NSTIC had lost its way. It's hard to judge totally accurately when lurking on the webcast from Sydney (at 4:00am) but the last plenary seemed pedestrian to me. And I'm afraid to say that some NSTIC committees have got a little testy. But today's NSTIC session here was a turning point. Not only are there a number or truly exciting pilots showing real progress, but Jeremy Grant has credible plans for improving accountability and momentum, and the new technology lead Paul Grassi is thinking outside the box and speaking out of school. The whole program seems fresh all over again.
In a packed presentation, Grassi impressed me enormously on a number of points:
- Firstly, he advocates a pragmatic NSTIC-focused extension of the old US government Authentication Guide NIST SP 800-63. Rather than a formal revision, a companion document might be most realistic. Along the way, Grassi really nailed an issue which we identity professionals need to talk about more: language. He said that there are words in 800-63 that are "never used anywhere else in systems development". No wonder, as he says, it's still "hard to implement identity"!
- Incidentally I chatted some more with Andrew Hughes about language; he is passionate about terms, and highlights that our term "Relying Party" is an especially terrible distraction for Service Providers whose reason-for-being has nothing to do with "relying" on anyone!
- Secondly, Paul Grassi wants to "get very aggressive on attributes", including emphasis on practical measurement (since that's really what NIST is all about). I don't think I need to say anything more about that than Bravo!
- And thirdly, Grassi asked "What if we got rid of LOAs?!". This kind of iconoclastic thinking is overdue, and was floated as part of a broad push to revamp the way government's orthodox thinking on Identity Assurance is translated to the business world. Grassi and Grant don't say LOAs can or should be abandoned by government, but they do see that shoving the rounded business concepts of identity into government's square hole has not done anyone much credit.
Just one small part of NSTIC annoyed me today: the persistent idea that federation hubs are inherently simpler than one-to-one authentication. They showed the following classic sort of 'before and after' shots, where it seems self-evident that a hub (here the Federal Cloud Credential Exchange FCCX) reduces complexity. The reality is that multilateral brokered arrangements between RPs and IdPs are far more complex than simple bilateral direct contracts. And moreover, the new forms of agreements are novel and untested in real world business. The time and cost and unpredictability of working out these new arrangements is not properly accounted for and has often been fatal to identity federations.
The dog barks and this time the caravan turns around
One of the top talking points at #CISmcc has of course been FIDO. The FIDO Alliance goes from strength to strength; we heard they have over 130 members now (remember it started with four or five less than 18 months ago). On Saturday afternoon there was a packed-out FIDO show case with six vendors showing real FIDO-ready products. And today there was a three hour deep dive into the two flagship FIDO protocols UAF (which enables better sharing of strong authentication signals such that passwords may be eliminated) and U2F (which standardises and strengthens Two Factor Authentication).
FIDO's marketing messages are improving all the time, thanks to a special focus on strategic marketing which was given its own working group. In particular, the Alliance is steadily clarifying the distinction between identity and authentication, and sticking adamantly to the latter. In other words, FIDO is really all about the attributes. FIDO leaves identity as a problem to be addressed further up the stack, and dedicates itself to strengthening the authentication signal sent from end-point devices to servers.
The protocol tutorials were excellent, going into detail about how "Attestation Certificates" are used to convey the qualities and attributes of authentication hardware (such as device model, biometric modality, security certifications, elapsed time since last user verification etc) thus enabling nice fine-grained policy enforcement on the RP side. To my mind, UAF and U2F show how nature intended PKI to have been used all along!
Some confusion remains as to why FIDO has two protocols. I heard some quiet calls for UAF and U2F to converge, yet that would seem to put the elegance of U2F at risk. And it's noteworthy that U2F is being taken beyond the original one time password 2FA, with at least one biometric vendor at the showcase claiming to use it instead of the heavier UAF.
Surprising use cases
Finally, today brought more fresh use cases from cohorts of users we socially privileged identity engineers for the most part rarely think about. Another NSTIC pilot partner is AARP, a membership organization providing "information, advocacy and service" to older people, retirees and other special needs groups. AARP's Jim Barnett gave a compelling presentation on the need to extend from the classic "free" business models of Internet services, to new economically sustainable approaches that properly protect personal information. Barnett stressed that "free" has been great and 'we wouldn't be where we are today without it' but it's just not going to work for health records for example. And identity is central to that.
There's so much more I could report if I had time. But I need to get some sleep before another packed day. All this changing my mind is exhausting.
Cheers again from Monterey.
First Day Reflections from CIS Monterey.
Follow along on Twitter at #CISmcc (for the Monterey Conference Centre).
The Cloud Identity Summit really is the top event on the identity calendar. The calibre of the speakers, the relevance and currency of the material, the depth and breadth of the cohort, and the international spread are all unsurpassed. It's been great to meet old cyber-friends in "XYZ Space" at last -- like Emma Lindley from the UK and Lance Peterman. And to catch up with such talented folks like Steffen Sorensen from New Zealand once again.
A day or two before, Ian Glazer of Salesforce asked in a tweet what we were expecting to get out of CIS. And I replied that I hoped to change my mind about something. It's unnerving to have your understanding and assumptions challenged by the best in the field ... OK, sometimes it's outright embarrassing ... but that's what these events are all about. A very wise lawyer said to me once, around 1999 at the dawn of e-commerce, that he had changed his mind about authentication a few times up to that point, and that he fully expected to change his mind again and again.
I spent most of Saturday in Open Identity Foundation workshops. OIDF chair Don Thibeau enthusiastically stressed two new(ish) initiatives: Mobile Connect in conjunction with the mobile carrier trade association GSM Association @GSMA, and HIE Connect for the health sector. For the uninitiated, HIE means Health Information Exchange, namely a hub for sharing structured e-health records among hospitals, doctors, pharmacists, labs, e-health records services, allied health providers, insurers, drug & device companies, researchers and carers; for the initiated, we know there is some language somewhere in which the letters H.I.E. stand for "Not My Lifetime".
But seriously, one of the best (and pleasantly surprising) things about HIE Connect as the OIDF folks tell it, is the way its leaders unflinchingly take for granted the importance of privacy in the exchange of patient health records. Because honestly, privacy is not a given in e-health. There are champions on the new frontiers like genomics that actually say privacy may not be in the interests of the patients (or more's the point, the genomics businesses). And too many engineers in my opinion still struggle with privacy as something they can effect. So it's great -- and believe me, really not obvious -- to hear the HIE Connects folks -- including Debbie Bucci from the US Dept of Health and Human Services, and Justin Richer of Mitre and MIT -- dealing with it head-on. There is a compelling fit for the OAUTH and OIDC protocols here, with their ability to manage discrete pieces of information about users (patients) and to permission them all separately. Having said that, Don and I agree that e-health records permissioning and consent is one of the great UI/UX challenges of our time.
Justin also highlighted that the RESTful patterns emerging for fine-grained permissions management in healthcare are not confined to healthcare. Debbie added that the ability to query rare events without undoing privacy is also going to be a core defining challenge in the Internet of Things.
MyPOV: We may well see tremendous use cases for the fruits of HIE Exchange before they're adopted in healthcare!
In the afternoon, we heard from Canadian and British projects that have been working with the Open Identity Exchange (OIX) program now for a few years each.
Emma Lindley presented the work they've done in the UK Identity Assurance Program (IDAP) with social security entitlements recipients. These are not always the first types of users we think of for sophisticated IDAM functions, but in Britain, local councils see enormous efficiency dividends from speeding up the issuance of eg disabled parking permits, not to mention reducing imposters, which cost money and lead to so much resentment of the well deserved. Emma said one Attributes Exchange beta project reduced the time taken to get a 'Blue Badge' permit from 10 days to 10 minutes. She went on to describe the new "Digital Sources of Trust" initiative which promises to reconnect under-banked and under-documented sections of society with mainstream financial services. Emma told me the much-abused word "transformational" really does apply here.
MyPOV: The Digital Divide is an important issue for me, and I love to see leading edge IDAM technologies and business processes being used to do something about it -- and relatively quickly.
Then Andre Boysen of SecureKey led a discussion of the Canadian identity ecosystem, which he said has stabilised nicely around four players: Federal Government, Provincial Govt, Banks and Carriers. Lots of operations and infrastructure precedents from the payments industry have carried over.
Andre calls the smart driver license of British Columbia the convergence of "street identity and digital identity".
MyPOV: That's great news - and yet comparable jurisdictions like Australia and the USA still struggle to join governments and banks and carriers in an effective identity synthesis without creating great privacy and commercial anxieties. All three cultures are similarly allergic to identity cards, but only in Canada have they managed to supplement drivers licenses with digital identities with relatively high community acceptance. In nearly a decade, Australia has been at a standstill in its national understanding of smartcards and privacy.
For mine, the CIS Quote of the Day came from Scott Rice of the Open ID Foundation. We all know the stark problem in our industry of the under-representation of Relying Parties in the grand federated identity projects. IdPs and carriers so dominate IDAM. Scott asked us to imagine a situation where "The auto industry was driven by steel makers". Governments wouldn't put up with that for long.
Can someone give us the figures? I wonder if Identity and Access Management is already more economically ore important than cars?!
Cheers from Monterey, Day 1.
For the past year, oncologists at the Memorial Sloan Kettering Cancer Centre in New York have been training IBM’s Watson – the artificial intelligence tour-de-force that beat allcomers on Jeopardy – to help personalise cancer care. The Centre explains that "combining [their] expertise with the analytical speed of IBM Watson, the tool has the potential to transform how doctors provide individualized cancer treatment plans and to help improve patient outcomes". Others are speculating already that Watson could "soon be the best doctor in the world".
I have no doubt that when Watson and things like it are available online to doctors worldwide, we will see overall improvements in healthcare outcomes, especially in parts of the world now under-serviced by medical specialists [having said that, the value of diagnosing cancer in poor developing nations is questionable if they cannot go on to treat it]. As with Google's self-driving car, we will probably get significant gains eventually, averaged across the population, from replacing humans with machines. Yet some of the foibles of computing are not well known and I think they will lead to surprises.
For all the wondrous gains made in Artificial Intelligence, where Watson now is the state-of-the art, A.I. remains algorithmic, and for that, it has inherent limitations that don't get enough attention. Computer scientists and mathematicians have know for generations that some surprisingly straightforward problems have no algorithmic solution. That is, some tasks cannot be accomplished by any universal step-by-step codified procedure. Examples include the Halting Problem and the Travelling Salesperson Problem. If these simple challenges have no algorithm, we need be more sober in our expectations of computerised intelligence.
A key limitation of any programmed algorithm is that it must make its decisions using a fixed set of inputs that are known and fully characterised (by the programmer) at design time. If you spring an unexpected input on any computer, it can fail, and yet that's what life is all about -- surprises. No mathematician seriously claims that what humans do is somehow magic; most believe we are computers made of meat. Nevertheless, when paradoxes like the Halting Problem abound, we can be sure that computing and cognition are not what they seem. We should hope these conundrums are better understood before putting too much faith in computers doing deep human work.
And yet, predictably, futurists are jumping ahead to imagine "Watson apps" in which patients access the supercomputer for themselves. Even if there were reliable algorithms for doctoring, I reckon the "Watson app" is a giant step, because of the complex way the patient's conditions are assessed and data is gathered for the diagnosis. That is, the taking of the medical history.
In these days of billion dollar investments in electronic health records (EHRs), we tend to think that medical decisions are all about the data. When politicians announce EHR programs they often boast that patients won't have to go through the rigmarole of giving their history over and over again to multiple doctors as they move through an episode of care. This is actually a serious misunderstanding of the importance in clinical decision-making of the interaction between medico and patient when the history is taken. It's subtle. The things a patient chooses to tell, the things they seem to be hiding, and the questions that make them anxious, all guide an experienced medico when taking a history, and provide extra cues (metadata if you will) about the patient’s condition.
Now, Watson may well have the ability to navigate this complexity and conduct a very sophisticated Q&A. It will certainly have a vastly bigger and more reliable memory of cases than any doctor, and with that it can steer a dynamic patient questionnaire. But will Watson be good enough to be made available direct to patients through an app, with no expert human mediation? Or will a host of new input errors result from patients typing their answers into a smart phone or speaking into a microphone, without any face-to-face subtlety (let alone human warmth)? It was true of mainframes and it’s just as true of the best A.I.: Bulldust in, bulldust out.
Finally, Watson's existing linguistic limitations are not to be underestimated. It is surely not trivial that Watson struggles with puns and humour. Futurist Mark Pesce when discussing Watson remarked in passing that scientists don’t understand the "quirks of language and intelligence" that create humour. The question of what makes us laugh does in fact occupy some of the finest minds in cognitive and social science. So we are a long way from being able to mechanise humour. And this matters because for the foreseeable future, it puts a great deal of social intercourse beyond AI's reach.
In between the extremes of laugh-out-loud comedy and a doctor’s dry written notes lies a spectrum of expressive subtleties, like a blush, an uncomfortable laugh, shame, and the humiliation that goes with some patients’ lived experience of illness. Watson may understand the English language, but does it understand people?
Watson can answer questions, but good doctors ask a lot of questions too. When will this amazing computer be able to hold the sort of two-way conversation that we would call a decent "bedside manner"?
Have a disruptive technology implementation story? Get recognised for your leadership. Apply for the 2014 SuperNova Awards for leaders in disruptive technology.
Multi-disciplined healthcare is standard practice today. Yet an important legal precedent to do with information sharing shows how important it is that practitioners do not presuppose how patients weigh health outcomes relative to privacy. As debate continues over opt-in and opt-out models for Patient Controlled Electronic Health Records, the lessons of this case should be re-visited, because it was sympathetic to a patient's right to withhold certain information from their carers for privacy reasons.
In 2004, an oncology patient KJ was being treated at a hospital west of Sydney by a multi-disciplined care team. At one point she consulted with a psychiatrist. Sometime later, notes of her psychiatric sessions were shared with others in the oncology team. KJ objected and complained to the NSW Administrative Decisions Tribunal that her privacy had been violated. Hospital management defended the sharing on the basis that it was normal in modern multi-disciplined healthcare and that it therefore represented reasonable Use of personal information under privacy legislation. However, the tribunal agreed with KJ that she should have been informed in advance that her psychiatric file would be shared with others. That is, the tribunal found that sharing patient information even with other professionals in the same facility constituted Disclosure of Personal Information and not just Use.
In broad terms, under Australian privacy laws, the Disclosure of Sensitive Personal Information generally requires the consent of the individual concerned, whereas Use does not, because it is related to the primary purpose for collection and would be regarded as reasonable by the individual concerned.
There is no argument that the exchange of health information with colleagues caring for the same patient is inherent to most good medical practice. Sharing information would probably be universally regarded by healthcare providers, in the context of privacy legislation, as a reasonable use, closely related to the primary purpose of collecting that information. And yet KJ v Wentworth Area Health Service recognises that the attitudes of patients as to what is reasonable may differ from those of doctors. If there is a significant risk that a given patient would not think it reasonable for information to be shared, then privacy legislation in Australia (as typified by NSW law) requires that their express consent is sought beforehand.
Many healthcare facilities in NSW responded to this case by improving their Privacy Notices. At the time of admission (and hopefully also at other times during their treatment journey) patients should be informed that their Personal Information may be disclosed to other healthcare professionals in the facility. This gives the patient the opportunity to withhold details they do not want disclosed more widely.
The tribunal noted in KJ v Wentworth Area Health Service that "while generally speaking the expression 'disclosure' refers to making personal information available to people outside an agency, in the case of large public sector agencies consisting of specialised units, the exchange of personal information between units may constitute disclosure".
In other words, lay people may perceive there to be greater "distance" between different units in the health system, even within the one hospital, than do healthcare professionals. Legally, it appears that the understandable interests of healthcare professionals to work closely together do not trump a patient's wishes to sometimes keep their Personal Information compartmentalised.
This precedent is important to the design of EHR systems, for it reminds us that the entirety of the record should not be automatically accessible by all providers. But more subtley, it also re-balances the argument often advanced by doctors that opt-in may be injurious because patients might not make the best decisions if they pick-and-choose what parts of their story to include in the EHR. Even if that clinical risk is real, the ruling in KJ vs Wentworth Area Health Service would appear to empower patients to do just that.
In my view, the resolution of this tension lies in better communication, and good faith. What matters above all in electronic health is trust and participation. We know that patients who fear for their privacy will actually decline treatment if they do not trust that their Personal Information will be safe. Whether an EHR is technically opt-in or opt-out doesn't matter in the long run if patients exercise their ultimate right to just stay away. Privacy anxieties may be especially acute around mental health, sexual assault, drug and alcohol abuse and so on. It is imperative for the public health benefits expected from e-health that patients with these sorts of conditions have faith in EHRs and do not simply drop out.
Reference: Case Note: KJ v Wentworth Area Health Service, NSWADT 84, Privacy NSW; Date of Decision: 3 May 2004
I'm going to follow my own advice and not accept the premise of Google's and Facebook's Real Names policy that it somehow is good for quality. My main rebuttal of Real Names is that it's a commercial tactic and not a well grounded worthy social policy.
But here are a few other points I would make if I did want to argue the merits of anonymity - a quality and basic right I honestly thought was unimpeachable!
Nothing to hide? Puhlease!
Much of the case for Real Names riffs on the tired old 'nothing to hide' argument. This tough-love kind of view that respectable people should not be precious about privacy tends to be the preserve of middle class, middle aged white men who through accident of birth have never personally experienced persecution, or had grounds to fear it.
I wish more of the privileged captains of the Internet could imagine that expressing one's political or religious views (for example) brings personal risks to many of the dispossessed or disadvantaged in the world. And as Identity Woman points out, we're not just talking about resistance fighters in the Middle East but also women in 21st century America who are pilloried for challenging the sexist status quo!
Some have argued that people who fear for their own safety should take their networking offline. That's an awfully harsh perpetuation of the digital divide. I don't deny that there are other ways for evil states to track us down online, and that using pseudonyms is no guarantee of safety. The Internet is indeed a risky place for conducting resistance for those who have mortal fears of surveillance. But ask the people who recently rose up on the back of social media if the risks were worth it, and the answer will be yes. Now ask them if the balance changes under a Real Names policy. And who benefits?
Some of the Internet metaphors are so bad they’re not even wrong
Some continue to compare the Internet with a "public square" and suggest there should be no expectation of privacy. In response, I note first of all that the public-private dichotomy is a red herring. Information privacy law is about controlling the flow of Personally Identifiable Information. Most privacy law doesn't care whether PII has come from the public domain or not: corporations and governments are not allowed to exploit PII harvested without consent.
Let's remember the standard set piece of spy movies where agents retreat to busy squares to have their most secret conversations. One's everyday activities in "public" are actually protected in many ways by the nature of the traditional social medium. Our voices don't carry far, and we can see who we're talking to. Our disclosures are limited to the people in our vicinity, we can whisper or use body language to obfuscate our messages, there is no retention of our PII, and so on. These protections are shattered by information technologies.
If Google's and Facebook's call for the end of anonymity were to extend to public squares, we'd be talking about installing CCTVs, tatooing peoples' names on their foreheads, recording everyone's comings and goings, and providing those records to any old private company to make whatever commercial use they see fit.
Medical OSN apartheid
What about medical social networking, which is one of the next frontiers for patient centric care, especially of mental health. Are patients supposed to use their real names for "transparency" and "integrity"? Of course not, because studies show participation in healthcare in general depends on privacy, and many patients decline to seek treatment if they fear they will be exposed.
Now, Real Names advocates would no doubt seek to make medical OSN a special case, but that would imply an expectation that all healthcare discussions be taken off regular social circles. That's just not how real life socialising occurs.
Anonymity != criminality
There's a recurring angle that anonymity is somehow unlawful or unscrupulous. This attitude is based more on guesswork than criminology. If there were serious statistics on crime being aided and abetted by anonymity then we could debate this point, but there aren't. All we have are wild pronouncements like Eugene Kaspersky's call for an Internet Passport. It seems to me that a great deal of crime is enabled by having too much identity online. It's ludicrous that I should hand over so much Personal Information to establish my bona fides in silly little transactions, when we all know that data is being hoovered up and used behind our backs by identity thieves.
And the idea that OSNs have crime prevention at heart when they force us to use "real names" is a little disingenuous when their response to bullying, child pornography, paedophilia and so on has for so long been characterised by keeping themselves at a cool distance.
What’s real anyway?
What’s so real about "real names" anyway? It's not like Google or Facebook they can check them (in fact, when it suited their purposes, the OSNs previously disclaimed any ability to verify names).
But more's the point, given names are arbitrary. It's perfectly normal for people growing up to not "identify with" the names their parents picked for them (or indeed to not identity with their parents at all). We all put some distance between our adult selves and our childhoods. A given family name is no more real in any social sense than any other handle we choose for ourselves.
The demise of Google Health might be a tactical retreat, but we need to understand what’s going on here and what it means for programs like Australia’s Patient-Controlled Electronic Health Record (PCEHR) and other commercial Personal EHRs like Microsoft’s HealthVault. On its face, it's sobering that the might and talent of Google hasn't been able to serve up a good solution.
There's no simple recipe for electronic health records; healthcare overall is an intractable system. Here are just a few things to think about, based on my time in e-health and working with medical devices:
1. Presentation of health information is hugely challenging. And healthcare providers and patients have totally different perspectives. Much more work needs to be done on the interfaces, and Google may feel that it’s better not to put off too many users at this stage with sub-optimal GUIs (especially if they need overhauling).
2. Clinical data on its own is near useless; it needs to go hand-in-glove with human expertise and also clinical applications (which feed data into the record, and extract data into decision support systems). The utility of PHRs used in isolation of healthcare experts still seems to be a wide open research field. What will patients be able to make of their own health data? Are PEHRs really only of interest to the "worried well"? Will it help or hinder when patients come to run their personal records through artifical intelligence services on the web?
3. Google is battered and bruised by a string of privacy controversies. While it bravely recovers its position and credibility after the Buzz and Street View wifi misadventures, it is exhibiting fresh caution; for instance they have put facial recognition on ice, with Eric Schmidt showing his soft side and calling it ‘too creepy’. [Maybe Google is going to tackle privacy in the same that Microsoft utterly revamped its security posture?] In any event, the last thing they need right now is a health related privacy stoush. At the end of the day, Google must make money out of e-health (and that’s entirely legitimate) but the business model may need a lot more careful work.
Points one and two apply to all PEHR/PCEHRs.
Designing an EHR dashboard that presents just the right information for the patient at hand, according to their current condition and the viewer’s particular interest, is a stupendous and fascinating task. Every clinical condition is different, and what a physician or patient really needs to see varies dramatically and deeply from one case to the next. It may require carefully characterising the everyman patient (actually, chronic patient, ambulatory out-patient, well person, parent ...) as well as the everyman healthcare professional (actually, nurse, GP, emergency intensivist, cardiologist ...).
We've all seen the literally fantastic videos of the hospital of the future, with physicians waltzing from bed to bed, bringing up multi-media charts on their tablet computers, and whizzing through test results, real time ECGs, decision support and so on. It looks great -- but aren’t they all just mock-ups?
STOP PRESS: The build-up to launch of Google+ recently might have also helped push Google Health back onto the drawing board. They may have sought to clear the decks for their privacy and governance teams!