Last month, over September 26-27, I attended a US government workshop on The Use of Blockchain in Healthcare and Research, organised by the Department of Health & Human Services Office of the National Coordinator (ONC) and hosted at NIST headquarters at Gaithersburg, Maryland. The workshop showcased a number of winning entries from ONC's Blockchain Challenge, and brought together a number of experts and practitioners from NIST and the Department of Homeland Security.
I presented an invited paper "Blockchain's Challenges in Real Life" (PDF) alongside other new research by Mance Harmon from Ping Identity, and Drummond Reed from Respect Network. All the workshop presentations, the Blockchain Challenge winners' papers and a number of the unsuccessful submissions are available on the ONC website. You will find contributions from major computer companies and consultancies, leading medical schools and universities, and a number of unaffiliated researchers.
I also sat on a panel session about identity innovation, joining entrepreneurs from Digital Bazaar, Factom, Respect Network, and XCELERATE, all of which are conducting R&D projects funded by the DHS Science and Technology division.
Around the same time as the workshop, I happened to finalise two new Constellation Research papers, on security and R&D practices for blockchain technologies. And that was timely, because I am afraid that once again, I have immersed myself in some of the most current blockchain thinking, only to find that key pieces of the puzzle are still missing.
Disclosure: I traveled to the Blockchain in Healthcare workshop as a guest of ONC, which paid for my transport and accommodation.
Three observations from the Workshop
There were two things I just did not get as I read the winning Blockchain Challenge papers and listened to the presentations. And I observe that there is one crucial element that most of the proposals are missing
Firstly, one of the most common themes across all of the papers was interoperability. A great challenge in e-health is indeed interoperability. Disparate health systems speak different languages, using different codes for the same medical procedures. Adoption of new standard terminologies and messaging standards, like HL-7 and ICD, is infamously slow, often taking a decade or longer. Large clinical systems are notoriously complex to implement, so along the way they invariably undergo major customisation, which makes each installation peculiar to its setting, and resistant to interfacing with other systems.
In the USA, Health Information Exchanges (HIEs) have been a common response to these problems, the idea being that an intermediary switching system can broker understanding between local e-health programs. But as anyone in the industry knows, HIEs have been easier said than done, to say the least.
According to many of the ONC challenge papers, blockchain is supposed to bring a breakthrough, yet no one has explained how a ledger will make the semantics of all these e-health silos suddenly compatible. Blockchain is a very specific protocol that addresses the order of entries in a distributed ledger, to prevent Double Spend without an administrator. Nothing about blockchain's fundamentals relates to the contents of messages, healthcare semantics, medical codes and so on. It just doesn't "do" interoperability! The complexity in healthcare is intrinsic to the subject matter; it cannot be willed away with any new storage technology.
The second thing I just didn't get about the workshop was the idea that blockchain will fix healthcare information silos. Several speakers stressed the problem that data is fragmented, concentrated in local repositories, and hard to find when needed. All true, but I don't see what blockchain can do about this. A consensus was reached at the workshop that personal information and Protected Health Information (PHI) should not be stored on the blockchain in any significant amounts (not just because of its sensitivity but also the sheer volume of electronic health records and images in particular). So if we're agreed that the blockchain could only hold pointers to health data, what difference can it make to the current complex of record systems?
And my third problem at the workshop was the stark omission of key management. This is the central administrative challenge in any security system, of getting the right cryptographic keys and credentials into the right hands, so all parties can be sure who they are dealing with. The thing about blockchain is that it did away with key management. The genius of the original Bitcoin blockchain is it allows people to exchange guaranteed value without needing to know anything about each other. Blockchain actually dispenses with key management and it may be unique in the history of security for doing so (see also Blockchain has no meaning). But when we do need to know who's who in a health system – to be certain when various users really are authorised medicos, researchers, insurers or patients – then key management must return to the mix. And then things get complicated, much more complicated than the utopian setting of Bitcoin.
Moreover, healthcare is hierarchical. Inherent to the system are management structures, authorizations, credentialing bodies, quality assurance and audits – all the things that blockchain's creator Satoshi Nakamoto expressly tried to get rid of. As I explained in my workshop speech, if a blockchain deployment still has to involve third parties, then the benefits of the algorithm are lost. So said Nakamoto him/herself!
In my view, most blockchain for healthcare projects will discover, sooner or later, than once the necessary key management arrangements are taken care of, their choice of distributed ledger technology becomes inconsequential.
New Constellation Research on Blockchain Technologies
Security for blockchains and Distributed Ledger Technologies (DLTs) have evolved quickly. As soon as interest in blockchain grew past crypto-currency into mainstream business applications, it became apparent that the core ledger would need to augmented with permissions for access control, and encryption for confidentiality. But what few people appreciate is that these measures conflict with the rationale of the original blockchain algorithm, which was expressly meant to dispel administration layers. The first of my new papers looks at these tensions, what they mean for public and private blockchain systems, paints a picture for third generation DLTs.
The uncomfortable marriage of ad hoc security and the early blockchain is indicative of a broader problem I've written about many times: too much blockchain "innovation" is proceeding with insufficient rigor. Which brings us to the second of my new papers. In the rush to apply blockchain to broader payments and real world assets, few entrepreneurs have been clear and precise about the problems they think they’re solving. If the R&D is not properly grounded, then the resulting solutions will be weak and will ultimately fail in the market. It must be appreciated that the original blockchain was only a prototype. Great care needs to be taken to learn from it and more rigorously adapt fast-evolving DLTs to enterprise needs.
Constellation ShortListTM for Distributed Ledger Technologies Labs
Finally, Constellation Research has launched a new product, the Constellation ShortListTM. These are punchy lists by our analysts of leading technologies in dozens of different categories, which will each be refreshed on a short cycle. The objective is to help buyers of technology when choosing offerings in new areas.
My Constellation ShortListTM for blockchain-related solution providers is now available here.
World Wide Web inventor Sir Tim Berners-Lee has given a speech in London, re-affirming the importance of privacy, but unfortunately he has muddied the waters by casting aspersions on privacy law. Berners-Lee makes a technologist's error, calling for unworkable new privacy mechanisms where none in fact are warranted.
The Telegraph reports Berners-Lee as saying "Some people say privacy is dead – get over it. I don't agree with that. The idea that privacy is dead is hopeless and sad." He highlighted that peoples' participation in potentially beneficial programs like e-health is hampered by a lack of trust, and a sense that spying online is constant.
Of course he's right about that. Yet he seems to underestimate the data privacy protections we already have. Instead he envisions "a world in which I have control of my data. I can sell it to you and we can negotiate a price, but more importantly I will have legal ownership of all the data about me" he said according to The Telegraph.
It's a classic case of being careful what you ask for, in case you get it. What would control over "all data about you" look like? Most of the data about us these days - most of the personal data, aka Personally Identifiable Information (PII) - is collected or created behind our backs, by increasingly sophisticated algorithms. Now, people certainly don't know enough about these processes in general, and in too few cases are they given a proper opportunity to opt in to Big Data processes. Better notice and consent mechanisms are needed for sure, but I don't see that ownership could fix a privacy problem.
What could "ownership" of data even mean? If personal information has been gathered by a business process, or created by clever proprietary algorithms, we get into obvious debates over intellectual property. Look at medical records: in Australia and I suspect elsewhere, it is understood that doctors legally own the medical records about a patient, but that patients have rights to access the contents. The interpretation of medical tests is regarded as the intellectual property of the healthcare professional.
The philosophical and legal quandries are many. With data that is only potentially identifiable, at what point would ownership flip from the data's creator to the individual to whom it applies? What if data applies to more than one person, as in household electricity records, or, more seriously, DNA?
What really matters is preventing the exploitation of people through data about them. Privacy (or, strictly speaking, data protection) is fundamentally about restraint. When an organisation knows you, they should be restrained in what they can do with that knowledge, and not use it against your interests. And thus, in over 100 countries, we see legislated privacy principles which require that organisations only collect the PII they really need for stated purposes, that PII collected for one reason not be re-purposed for others, that people are made reasonably aware of what's going on with their PII, and so on.
Berners-Lee alluded to the privacy threats of Big Data, and he's absolutely right. But I point out that existing privacy law can substantially deal with Big Data. It's not necessary to make new and novel laws about data ownership. When an algorithm works out something about you, such as your risk of developing diabetes, without you having to fill out a questionnaire, then that process has collected PII, albeit indirectly. Technology-neutral privacy laws don't care about the method of collection or creation of PII. Synthetic personal data, collected as it were algorithmically, is treated by the law in the same way as data gathered overtly. An example of this principle is found in the successful European legal action against Facebook for automatic tag suggestions, in which biometric facial recognition algorithms identify people in photos without consent.
Technologists often under-estimate the powers of existing broadly framed privacy laws, doubtless because technology neutrality is not their regular stance. It is perhaps surprising, yet gratifying, that conventional privacy laws treat new technologies like Big Data and the Internet of Things as merely potential new sources of personal information. If brand new algorithms give businesses the power to read the minds of shoppers or social network users, then those businesses are limited in law as to what they can do with that information, just as if they had collected it in person. Which is surely what regular people expect.
On July 23, BlackBerry hosted its second annual Security Summit, once again in New York City. As with last year’s event, this was a relatively intimate gathering of analysts and IT journalists, brought together for the lowdown on BlackBerry’s security and privacy vision.
By his own account, CEO John Chen has met plenty of scepticism over his diverse and, some say, chaotic product and services portfolio. And yet it’s beginning to make sense. There is a strong credible thread running through Chen’s initiatives. It all has to do with the Internet of Things.
Disclosure: I traveled to the Blackberry Security Summit as a guest of Blackberry, which covered my transport and accommodation.
The Growth Continues
In 2014, John Chen opened the show with the announcement he was buying the German voice encryption firm Secusmart. That acquisition appears to have gone well for all concerned; they say nobody has left the new organisation in the 12 months since. News of BlackBerry’s latest purchase - of crisis communications platform AtHoc - broke a few days before this year’s Summit, and it was only the most recent addition to the family. In the past 12 months, BlackBerry has been busy spending $150M on inorganic growth, picking up:
Chen has also overseen an additional $100M expenditure in the same timeframe on organic security expansion (over and above baseline product development). Amongst other things BlackBerry has:
The Growth Explained - Secure Mobile Communications
Executives from different business units and different technology horizontals all organised their presentations around what is now a comprehensive security product and services matrix. It looks like this (before adding AtHoc):
BlackBerry is striving to lead in Secure Mobile Communications. In that context the highlights of the Security Summit for mine were as follows.
The Internet of Things
BlackBerry’s special play is in the Internet of Things. It’s the consistent theme that runs through all their security investments, because as COO Marty Beard says, IoT involves a lot more than machine-to-machine communications. It’s more about how to extract meaningful data from unbelievable numbers of devices, with security and privacy. That is, IoT for BlackBerry is really a security-as-a-service play.
Chief Security Officer David Kleidermacher repeatedly stressed the looming challenge of “how to patch and upgrade devices at scale”.
- MyPOV: Functional upgrades for smart devices will of course be part and parcel of IoT, but at the same time, we need to work much harder to significantly reduce the need for reactive security patches. I foresee an angry consumer revolt if things that never were computers start to behave and fail like computers. A radically higher standard of quality and reliability is required. Just look at the Jeep Uconnect debacle, where it appears Chrysler eventually thought better of foisting a patch on car owners and instead opted for a much more expensive vehicle recall. It was BlackBerry’s commitment to ultra high reliability software that really caught my attention at the 2014 Security Summit, and it convinces me they grasp what’s going to be required to make ubiquitous computing properly seamless.
Refreshingly, COO Beard preferred to talk about economic value of the IoT, rather than the bazillions of devices we are all getting a little jaded about. He said the IoT would bring about $4 trillion of required technology within a decade, and that the global economic impact could be $11 trillion.
BlackBerry’s real time operating system QNX is in 50 million cars today.
AtHoc is a secure crisis communications service, with its roots in the first responder environment. It’s used by three million U.S. government workers today, and the company is now pushing into healthcare.
Founder and CEO Guy Miasnik explained that emergency communications involves more than just outbound alerts to people dealing with disasters. Critical to crisis management is the secure inbound collection of info from remote users. AtHoc is also not just about data transmission (as important as that is) but it works also at the application layer, enabling sophisticated workflow management. This allows procedures for example to be defined for certain events, guiding sets of users and devices through expected responses, escalating issues if things don’t get done as expected.
We heard more about BlackBerry’s collaboration with Oxford University on the Centre for High Assurance Computing Excellence, first announced in April at the RSA Conference. CHACE is concerned with a range of fundamental topics, including formal methods for verifying program correctness (an objective that resonates with BlackBerry’s secure operating system division QNX) and new security certification methodologies, with technical approaches based on the Common Criteria of ISO 15408 but with more agile administration to reduce that standard’s overhead and infamous rigidity.
CSO Kleidermacher announced that CHACE will work with the Diabetes Technology Society on a new healthcare security standards initiative. The need for improved medical device security was brought home vividly by an enthralling live demonstration of hacking a hospital drug infusion pump. These vulnerabilities have been exposed before at hacker conferences but BlackBerry’s demo was especially clear and informative, and crafted for a non-technical executive audience.
- MyPOV: The message needs to be broadcast loud and clear: there are life-critical machines in widespread use, built on commercial computing platforms, without any careful thought for security. It’s a shameful and intolerable situation.
I was impressed by BlackBerry’s privacy line. It's broader and more sophisticated than most security companies, going way beyond the obvious matters of encryption and VPNs. In particular, the firm champions identity plurality. For instance, WorkLife by BlackBerry, powered by Movirtu technology, realizes multiple identities on a single phone. BlackBerry is promoting this capability in the health sector especially, where there is rarely a clean separation of work and life for professionals. Chen said he wants to “separate work and private life”.
The health sector in general is one of the company’s two biggest business development priorities (the other being automotive). In addition to sophisticated telephony like virtual SIMs, they plan to extend extend AtHoc into healthcare messaging, and have tasked the CHACE think-tank with medical device security. These actions complement BlackBerry’s fine words about privacy.
So BlackBerry’s acquisition plan has gelled. It now has perhaps the best secure real time OS for smart devices, a hardened device-independent Mobile Device Management backbone, new data-centric privacy and rights management technology, remote certificate management, and multi-layered emergency communications services that can be diffused into mission-critical rules-based e-health settings and, eventually, automated M2M messaging. It’s a powerful portfolio that makes strong sense in the Internet of Things.
BlackBerry says IoT is 'much more than device-to-device'. It’s more important to be able to manage secure data being ejected from ubiquitous devices in enormous volumes, and to service those things – and their users – seamlessly. For BlackBerry, the Internet of Things is really all about the service.
The Australian government is to revamp the troubled Personally Controlled Electronic Health Record (PCEHR). In line with the Royle Review from Dec 2013, it is reported that patient participation is to change from the current Opt-In model to Opt-Out; see "Govt to make e-health records opt-out" by Paris Cowan, IT News.
That is to say, patient data from hospitals, general practice, pathology and pharmacy will be added by default to a central longitudinal health record, unless patients take steps (yet to be specified) to disable sharing.
The main reason for switching the consent model is simply to increase the take-up rate. But it's a much bigger change than many seem to realise.
The government is asking the community to trust it to hold essentially all medical records. Are the PCEHR's security and privacy safeguards up to scratch to take on this grave responsibility? I argue the answer is no, on two grounds.
Firstly there is the practical matter of PCEHR's security performance to date. It's not good, based on publicly available information. On multiple occasions, prescription details have been uploaded from community pharmacy to the wrong patient's records. There have been a few excuses made for this error, with blame sheeted home to the pharmacy. But from a system's perspective -- and health care is all about the systems -- you cannot pass the buck like that. Pharmacists are using a PCEHR system that was purportedly designed for them. And it was subject to system-wide threat & risk assessments that informed the architecture and design of not just the electronic records system but also the patient and healthcare provider identification modules. How can it be that the PCEHR allows such basic errors to occur?
Secondly and really fundamentally, you simply cannot invert the consent model as if it's a switch in the software. The privacy approach is deep in the DNA of the system. Not only must PCEHR security be demonstrably better than experience suggests, but it must be properly built in, not retrofitted.
Let me explain how the consent approach crops up deep in the architecture of something like PCEHR. During analysis and design, threat & risk assessments (TRAs) and privacy impact assessments (PIAs) are undertaken, to identify things that can go wrong, and to specify security and privacy controls. These controls generally comprise a mix of technology, policy and process mechanisms. For example, if there is a risk of patient data being sent to the wrong person or system, that risk can be mitigated a number of ways, including authentication, user interface design, encryption, contracts (that obligate receivers to act responsibly), and provider and patient information. The latter are important because, as we all should know, there is no such thing as perfect security. Mistakes are bound to happen.
One of the most fundamental privacy controls is participation. Individuals usually have the ultimate option of staying away from an information system if they (or their advocates) are not satisfied with the security and privacy arrangements. Now, these are complex matters to evaluate, and it's always best to assume that patients do not in fact have a complete understanding of the intricacies, the pros and cons, and the net risks. People need time and resources to come to grips with e-health records, so a default opt-in affords them that breathing space. And it errs on the side of caution, by requiring a conscious decision to participate. In stark contrast, a default opt-out policy embodies a position that the scheme operator believes it knows best, and is prepared to make the decision to participate on behalf of all individuals.
Such a position strikes many as beyond the pale, just on principle. But if opt-out is the adopted policy position, then clearly it has to be based on a risk assessment where the pros indisputably out-weigh the cons. And this is where making a late switch to opt-out is unconscionable.
You see, in an opt-in system, during analysis and design, whenever a risk is identified that cannot be managed down to negligible levels by way of technology and process, the ultimate safety net is that people don't need to use the PCEHR. It is a formal risk management ploy (a part of the risk manager's toolkit) to sometimes fall back on the opt-in policy. In an opt-in system, patients sign an agreement in which they accept some risk. And the whole security design is predicated on that.
Look at the most recent PIA done on the PCEHR in 2011; section 9.1.6 "Proposed solutions - legislation" makes it clear that opt-in participation is core to the existing architecture. The PIA makes a "critical legislative recommendation" including:
- a number of measures to confirm and support the 'opt in' nature of the PCEHR for consumers (Recommendations 4.1 to 4.3) [and] preventing any extension of the scope of the system, or any change to the 'opt in' nature of the PCEHR.
The PIA at section 2.2 also stresses that a "key design feature of the PCEHR System ... is opt in – if a consumer or healthcare provider wants to participate, they need to register with the system." And that the PCEHR is "not compulsory – both consumers and healthcare providers choose whether or not to participate".
A PDF copy of the PIA report, which was publicly available at the Dept of Health website for a few years after 2011, is archived here.
The fact is that if the government changes the PCEHR from opt-in to opt-out, it will invalidate the security and privacy assessments done to date. The PIAs and TRAs will have to be repeated, and the project must be prepared for major redesign.
The Royle Review report (PDF) did in fact recommend "a technical assessment and change management plan for an opt-out model ..." (Recommendation 14) but I am not aware that such a review has taken place.
To look at the seriousness of this another way, think about "Privacy by Design", the philosophy that's being steadily adopted across government. In 2014 NEHTA wrote in a submission (PDF) to the Australian Privacy Commissioner:
- The principle that entities should employ “privacy by design” by building privacy into their processes, systems, products and initiatives at the design stage is strongly supported by NEHTA. The early consideration of privacy in any endeavour ensures that the end product is not only compliant but meets the expectations of stakeholders.
One of the tenets of Privacy by Design is that you cannot bolt on privacy after a design is done. Privacy must be designed into the fabric of any system from the outset. All the way along, PCEHR has assumed opt-in, and the last PIA enshrined that position.
If the government was to ignore its own Privacy by Design credo, and not revisit the PCEHR architecture, it would be an amazing breach of the public's trust in the healthcare system.
Acknowledgement: Daniel Barth-Jones kindly engaged with me after this blog was initially published, and pointed out several significant factual errors, for which I am grateful.
In 2014, the New York Taxi & Limousine Company (TLC) released a large "anonymised" dataset containing 173 million taxi rides taken in 2013. Soon after, software developer Vijay Pandurangan managed to undo the hashed taxi registration numbers. Subsequently, privacy researcher Anthony Tockar went on to combine public photos of celebrities getting in or out of cabs, to recreate their trips. See Anna Johnston's analysis here.
This re-identification demonstration has been used by some to bolster a general claim that anonymity online is increasingly impossible.
On the other hand, medical research advocates like Columbia University epidemiologist Daniel Barth-Jones argue that the practice of de-identification can be robust and should not be dismissed as impractical on the basis of demonstrations such as this. The identifiability of celebrities in these sorts of datasets is a statistical anomaly reasons Barth-Jones and should not be used to frighten regular people out of participating in medical research on anonymised data. He wrote in a blog that:
- "However, it would hopefully be clear that examining a miniscule proportion of cases from a population of 173 million rides couldn’t possibly form any meaningful basis of evidence for broad assertions about the risks that taxi-riders might face from such a data release (at least with the taxi medallion/license data removed as will now be the practice for FOIL request data)."
As a health researcher, Barth-Jones is understandably worried that re-identification of small proportions of special cases is being used to exaggerate the risks to ordinary people. He says that the HIPAA de-identification protocols if properly applied leave no significant risk of re-id. But even if that's the case, HIPAA processes are not applied to data across the board. The TLC data was described as "de-identified" and the fact that any people at all (even stand-out celebrities) could be re-identified from data does create a broad basis for concern - "de-identified" is not what it seems. Barth-Jones stresses that in the TLC case, the de-identification was fatally flawed [technically: it's no use hashing data like registration numbers with limited value ranges because the hashed values can be reversed by brute force] but my point is this: who among us who can tell the difference between poorly de-identified and "properly" de-identified?
And how long can "properly de-identified" last? What does it mean to say casually that only a "minuscule proportion" of data can be re-identified? In this case, the re-identification of celebrities was helped by the fact lots of photos of them are readily available on social media, yet there are so many photos in the public domain now, regular people are going to get easier to be identified.
But my purpose here is not to play what-if games, and I know Daniel advocates statistically rigorous measures of identifiability. We agree on that -- in fact, over the years, we have agreed on most things. The point I am trying to make in this blog post is that, just as nobody should exaggerate the risk of re-identification, nor should anyone play it down. Claims of de-identification are made almost daily for really crucial datasets, like compulsorily retained metadata, public health data, biometric templates, social media activity used for advertising, and web searches. Some of these claims are made with statistical rigor, using formal standards like the HIPAA protocols; but other times the claim is casual, made with no qualification, with the aim of comforting end users.
"De-identified" is a helluva promise to make, with far-reaching ramifications. Daniel says de-identification researchers use the term with caution, knowing there are technical qualifications around the finite probability of individuals remaining identifiable. But my position is that the fine print doesn't translate to the general public who only hear that a database is "anonymous". So I am afraid the term "de-identified" is meaningless outside academia, and in casual use is misleading.
Barth-Jones objects to the conclusion that "it's virtually impossible to anonymise large data sets" but in an absolute sense, that claim is surely true. If any proportion of people in a dataset may be identified, then that data set is plainly not "anonymous". Moreover, as statistics and mathematical techniques (like facial recognition) improve, and as more ancillary datasets (like social media photos) become accessible, the proportion of individuals who may be re-identified will keep going up.[Readers who wish to pursue these matters further should look at the recent Harvard Law School online symposium on "Re-identification Demonstrations", hosted by Michelle Meyer, in which Daniel Barth-Jones and I participated, among many others.]
Both sides of this vexed debate need more nuance. Privacy advocates have no wish to quell medical research per se, nor do they call for absolute privacy guarantees, but we do seek full disclosure of the risks, so that the cost-benefit equation is understood by all. One of the obvious lessons in all this is that "anonymous" or "de-identified" on their own are poor descriptions. We need tools that meaningfully describe the probability of re-identification. If statisticians and medical researchers take "de-identified" to mean "there is an acceptably small probability, namely X percent, of identification" then let's have that fine print. Absent the detail, lay people can be forgiven for thinking re-identification isn't going to happen. Period.
And we need policy and regulatory mechanisms to curb inappropriate re-identification. Anonymity is a brittle, essentially temporary, and inadequate privacy tool.
I argue that the act of re-identification ought to be treated as an act of Algorithmic Collection of PII, and regulated as just another type of collection, albeit an indirect one. If a statistical process results in a person's name being added to a hitherto anonymous record in a database, it is as if the data custodian went to a third party and asked them "do you know the name of the person this record is about?". The fact that the data custodian was clever enough to avoid having to ask anyone about the identity of people in the re-identified dataset does not alter the privacy responsibilities arising. If the effect of an action is to convert anonymous data into personally identifiable information (PII), then that action collects PII. And in most places around the world, any collection of PII automatically falls under privacy regulations.
It looks like we will never guarantee anonymity, but the good news is that for privacy, we don't actually need to. Privacy is the protection you need when you affairs are not anonymous, for privacy is a regulated state where organisations that have knowledge about you are restrained in what they do with it. Equally, the ability to de-anonymise should be restricted in accordance with orthodox privacy regulations. If a party chooses to re-identify people in an ostensibly de-identified dataset, without a good reason and without consent, then that party may be in breach of data privacy laws, just as they would be if they collected the same PII by conventional means like questionnaires or surveillance.
Surely we can all agree that re-identification demonstrations serve to shine a light on the comforting claims made by governments for instance that certain citizen datasets can be anonymised. In Australia, the government is now implementing telecommunications metadata retention laws, in the interests of national security; the metadata we are told is de-identified and "secure". In the UK, the National Health Service plans to make de-identified patient data available to researchers. Whatever the merits of data mining in diverse fields like law enforcement and medical research, my point is that any government's claims of anonymisation must be treated critically (if not skeptically), and subjected to strenuous and ongoing privacy impact assessment.
Privacy, like security, can never be perfect. Privacy advocates must avoid giving the impression that they seek unrealistic guarantees of anonymity. There must be more to privacy than identity obscuration (to use a more technically correct term than "de-identification"). Medical research should proceed on the basis of reasonable risks being taken in return for beneficial outcomes, with strong sanctions against abuses including unwarranted re-identification. And then there wouldn't need to be a moral panic over re-identification if and when it does occur, because anonymity, while highly desirable, is not essential for privacy in any case.
Update 22 September 2014
Last week, Apple suddenly went from silent to expansive on privacy, and the thrust of my blog straight after the Apple Watch announcement is now wrong. Apple posted a letter from CEO Tim Cook at www.apple.com/privacy along with a document that sets outs how "We’ve built privacy into the things you use every day".
The paper is very interesting. It's a sophisticated and balanced account of policy, business strategy and technology elements that go to create privacy. Apple highlights that they:
- forswear the exploitation of customer data
- do not scan content or messages
- do not let their small "iAd" business take data from other Apple departments
- require certain privacy protective practices on the part of their health app developers.
They have also provided quite decent information about how Siri and health data is handled.
Apple's stated privacy posture is all about respect and self-restraint. Setting out these principles and commitments is a very welcome development indeed. I congratulate them.
Today Apple launched their much anticipated wrist watch, described by CEO Tim Cook as "the most personal device they have ever developed". He got that right!
Rather more than a watch, it's a sort of guardian angel. The Apple Watch has Siri built-in, along with new haptic sensors and buzzers, a heartbeat monitor, accelerometer, and naturally the GPS and Wi-Fi geolocation capability to track your speed and position throughout the day. So they say "Apple Watch is an all-day fitness tracker and a highly advanced sports watch in a single device".
The Apple Watch will be a paragon of digital disruption. To understand and master disruption today requires the coordination of mobility, Big Data, the cloud and user interfaces. These cannot be treated as isolated technologies, so when a company like Apple controls them all, at scale, real transformation follows.
Thus Apple is one of the few businesses that can make promises like this: "Over time, Apple Watch gets to know you the way a good personal trainer would". In this we hear echoes of the smarts that power Siri, and we are reminded that amid the novel intimacy we have with these devices, many serious privacy problems have yet to be resolved.
The Apple Event today was a play in four acts:
Act I: the iPhone 6 release;
Act II: Apple Pay launch;
Act III: the Apple Watch announcement;
Act IV: U2 played live and released their new album free on iTunes!
It was fascinating to watch the thematic differences across these stanzas. With Apple Pay, they stressed security and privacy; we were told about the Secure Element, the way card numbers are replaced by random numbers (tokenization), and an architecture where Apple cannot see how much you spend nor where you spend it. On the other hand, when it came to the Apple Watch and its integrated health sensors, privacy wasn't mentioned, not at all. We are left to deduce that aggregating personal health data at Apple's servers is a part of a broader plan.
With Siri, Apple sadly fails all these tests.See Update 22 September 2014 above.
It's been left to journalists to try and find out what Apple does with the information it mines from Siri. Wired magazine discovered eventually that Apple retains masked Siri voice recordings for six months; it then purportedly de-identifies them and keeps them for a further 18 months, for research. Yet even these explanations don't touch on the extracted contents of the communications, nor the metadata, like the trends and correlations that go to Siri's learning. If the purpose of Siri is ostensibly to automate the operation of the iPhone and its apps, then Apple should be refrain from using the by-products of Siri's voice processing for anything else.
But we just don't know what they do, and Apple imposes no self-restraint.See Update 22 September 2014 above.
We should hope for radically greater transparency with the Apple Watch and its health apps. Most of the watch's data processing and analytics will be carried out in the cloud. So Apple will come to hold detailed records of its users' exercise regimes, their performance figures, trend data and correlations. These are health records. Inevitably, health applications will take in other medical data, like food diaries entered by users, statistics imported from other databases, and detailed measurements from Internet-connected scales, blood pressure monitors and even medical devices. Apple will see what we're doing to improve our health, day by day, year on year. They will come to know more about what's making us healthy and what's not than we do ourselves.
Now, the potential benefits from this sort of personal technology to self-managed care and preventative medicine are enormous. But so are the data management and privacy obligations.
Within the US, Apple will doubtless be taking steps to avoid falling under the stringent HIPAA regulations, yet in the rest of the world, a more subtle but far-reaching problem looms. Many broad based data privacy regimes forbid the collection of health information without consent. And the laws of the European Union, Australia, New Zealand and elsewhere are generally technology neutral. This means that data collected directly from patients or doctors, and fresh data collected by way of automated algorithms are treated essentially the same way. So when a sophisticated health management app running in the cloud somewhere mines all that exercise and lifestyle data, and starts to make inferences about health and wellbeing, great care needs to be taken that the indiviuals concerned know what's going on in advance, and have given their informed consent.
It ought to be possible to expressly opt in to Big Data processes when you can understand the pros and cons and the net benefits, and to later opt out, and opt back in again, as the benefit equation shifts over time. But even visualising the products of Big Data is hard; I believe graphical user interfaces (GUIs) to allow people to comprehend and actively control the process will be one of the great software design problems of our age.
Apple are obviously preeminent in GUI and user experience innovation. You would think if anyone can create the novel yet intuitive interfaces desperately needed to control Big Data PII, Apple can. But first they will have to embrace their responsibilities for the increasingly intimate details they are helping themselves to. If the Apple Watch is "the most personal device they've ever designed" then let's see privacy and data protection commitments to match.
Second Day Reflections from CIS Monterey.
Follow along on Twitter at #CISmcc (for the Monterey Conference Centre).
The Attributes push
At CIS 2013 in Napa a year ago, several of us sensed a critical shift in focus amongst the identerati - from identity to attributes. OIX launched the Attributes Exchange Network (AXN) architecture, important commentators like Andrew Nash were saying, 'hey, attributes are more interesting than identity', and my own #CISnapa talk went so far as to argue we should forget about identity altogether. There was a change in the air, but still, it was all pretty theoretical.
Twelve months on, and the Attributes push has become entirely practical. If there was a Word Cloud for the NSTIC session, my hunch is that "attributes" would dominate over "identity". Several live NSTIC pilots are all about the Attributes.
ID.me is a new company started by US military veterans, with the aim of improving access for the veterans community to discounted goods and services and other entitlements. Founders Matt Thompson and Blake Hall are not identerati -- they're entirely focused on improving online access for their constituents to a big and growing range of retailers and services, and offer a choice of credentials for proving veterans bona fides. It's central to the ID.me model that users reveal as little as possible about their personal identities, while having their veterans' status and entitlements established securely and privately.
Another NSTIC pilot Relying Party is the financial service sector infrastructure provider Broadridge. Adrian Chernoff, VP for Digital Strategy, gave a compelling account of the need to change business models to take maximum advantage of digital identity. Broadridge recently announced a JV with Pitney Bowes called Inlet, which will enable the secure sharing of discrete and validated attributes - like name, address and social security number - in an NSTIC compliant architecture.
Yesterday I said in my #CISmcc diary that I hoped to change my mind about something here, and half way through Day 2, I was delighted it was already happening. I've got a new attitude about NSTIC.
Over the past six months, I had come to fear NSTIC had lost its way. It's hard to judge totally accurately when lurking on the webcast from Sydney (at 4:00am) but the last plenary seemed pedestrian to me. And I'm afraid to say that some NSTIC committees have got a little testy. But today's NSTIC session here was a turning point. Not only are there a number or truly exciting pilots showing real progress, but Jeremy Grant has credible plans for improving accountability and momentum, and the new technology lead Paul Grassi is thinking outside the box and speaking out of school. The whole program seems fresh all over again.
In a packed presentation, Grassi impressed me enormously on a number of points:
- Firstly, he advocates a pragmatic NSTIC-focused extension of the old US government Authentication Guide NIST SP 800-63. Rather than a formal revision, a companion document might be most realistic. Along the way, Grassi really nailed an issue which we identity professionals need to talk about more: language. He said that there are words in 800-63 that are "never used anywhere else in systems development". No wonder, as he says, it's still "hard to implement identity"!
- Incidentally I chatted some more with Andrew Hughes about language; he is passionate about terms, and highlights that our term "Relying Party" is an especially terrible distraction for Service Providers whose reason-for-being has nothing to do with "relying" on anyone!
- Secondly, Paul Grassi wants to "get very aggressive on attributes", including emphasis on practical measurement (since that's really what NIST is all about). I don't think I need to say anything more about that than Bravo!
- And thirdly, Grassi asked "What if we got rid of LOAs?!". This kind of iconoclastic thinking is overdue, and was floated as part of a broad push to revamp the way government's orthodox thinking on Identity Assurance is translated to the business world. Grassi and Grant don't say LOAs can or should be abandoned by government, but they do see that shoving the rounded business concepts of identity into government's square hole has not done anyone much credit.
Just one small part of NSTIC annoyed me today: the persistent idea that federation hubs are inherently simpler than one-to-one authentication. They showed the following classic sort of 'before and after' shots, where it seems self-evident that a hub (here the Federal Cloud Credential Exchange FCCX) reduces complexity. The reality is that multilateral brokered arrangements between RPs and IdPs are far more complex than simple bilateral direct contracts. And moreover, the new forms of agreements are novel and untested in real world business. The time and cost and unpredictability of working out these new arrangements is not properly accounted for and has often been fatal to identity federations.
The dog barks and this time the caravan turns around
One of the top talking points at #CISmcc has of course been FIDO. The FIDO Alliance goes from strength to strength; we heard they have over 130 members now (remember it started with four or five less than 18 months ago). On Saturday afternoon there was a packed-out FIDO show case with six vendors showing real FIDO-ready products. And today there was a three hour deep dive into the two flagship FIDO protocols UAF (which enables better sharing of strong authentication signals such that passwords may be eliminated) and U2F (which standardises and strengthens Two Factor Authentication).
FIDO's marketing messages are improving all the time, thanks to a special focus on strategic marketing which was given its own working group. In particular, the Alliance is steadily clarifying the distinction between identity and authentication, and sticking adamantly to the latter. In other words, FIDO is really all about the attributes. FIDO leaves identity as a problem to be addressed further up the stack, and dedicates itself to strengthening the authentication signal sent from end-point devices to servers.
The protocol tutorials were excellent, going into detail about how "Attestation Certificates" are used to convey the qualities and attributes of authentication hardware (such as device model, biometric modality, security certifications, elapsed time since last user verification etc) thus enabling nice fine-grained policy enforcement on the RP side. To my mind, UAF and U2F show how nature intended PKI to have been used all along!
Some confusion remains as to why FIDO has two protocols. I heard some quiet calls for UAF and U2F to converge, yet that would seem to put the elegance of U2F at risk. And it's noteworthy that U2F is being taken beyond the original one time password 2FA, with at least one biometric vendor at the showcase claiming to use it instead of the heavier UAF.
Surprising use cases
Finally, today brought more fresh use cases from cohorts of users we socially privileged identity engineers for the most part rarely think about. Another NSTIC pilot partner is AARP, a membership organization providing "information, advocacy and service" to older people, retirees and other special needs groups. AARP's Jim Barnett gave a compelling presentation on the need to extend from the classic "free" business models of Internet services, to new economically sustainable approaches that properly protect personal information. Barnett stressed that "free" has been great and 'we wouldn't be where we are today without it' but it's just not going to work for health records for example. And identity is central to that.
There's so much more I could report if I had time. But I need to get some sleep before another packed day. All this changing my mind is exhausting.
Cheers again from Monterey.
First Day Reflections from CIS Monterey.
Follow along on Twitter at #CISmcc (for the Monterey Conference Centre).
The Cloud Identity Summit really is the top event on the identity calendar. The calibre of the speakers, the relevance and currency of the material, the depth and breadth of the cohort, and the international spread are all unsurpassed. It's been great to meet old cyber-friends in "XYZ Space" at last -- like Emma Lindley from the UK and Lance Peterman. And to catch up with such talented folks like Steffen Sorensen from New Zealand once again.
A day or two before, Ian Glazer of Salesforce asked in a tweet what we were expecting to get out of CIS. And I replied that I hoped to change my mind about something. It's unnerving to have your understanding and assumptions challenged by the best in the field ... OK, sometimes it's outright embarrassing ... but that's what these events are all about. A very wise lawyer said to me once, around 1999 at the dawn of e-commerce, that he had changed his mind about authentication a few times up to that point, and that he fully expected to change his mind again and again.
I spent most of Saturday in Open Identity Foundation workshops. OIDF chair Don Thibeau enthusiastically stressed two new(ish) initiatives: Mobile Connect in conjunction with the mobile carrier trade association GSM Association @GSMA, and HIE Connect for the health sector. For the uninitiated, HIE means Health Information Exchange, namely a hub for sharing structured e-health records among hospitals, doctors, pharmacists, labs, e-health records services, allied health providers, insurers, drug & device companies, researchers and carers; for the initiated, we know there is some language somewhere in which the letters H.I.E. stand for "Not My Lifetime".
But seriously, one of the best (and pleasantly surprising) things about HIE Connect as the OIDF folks tell it, is the way its leaders unflinchingly take for granted the importance of privacy in the exchange of patient health records. Because honestly, privacy is not a given in e-health. There are champions on the new frontiers like genomics that actually say privacy may not be in the interests of the patients (or more's the point, the genomics businesses). And too many engineers in my opinion still struggle with privacy as something they can effect. So it's great -- and believe me, really not obvious -- to hear the HIE Connects folks -- including Debbie Bucci from the US Dept of Health and Human Services, and Justin Richer of Mitre and MIT -- dealing with it head-on. There is a compelling fit for the OAUTH and OIDC protocols here, with their ability to manage discrete pieces of information about users (patients) and to permission them all separately. Having said that, Don and I agree that e-health records permissioning and consent is one of the great UI/UX challenges of our time.
Justin also highlighted that the RESTful patterns emerging for fine-grained permissions management in healthcare are not confined to healthcare. Debbie added that the ability to query rare events without undoing privacy is also going to be a core defining challenge in the Internet of Things.
MyPOV: We may well see tremendous use cases for the fruits of HIE Exchange before they're adopted in healthcare!
In the afternoon, we heard from Canadian and British projects that have been working with the Open Identity Exchange (OIX) program now for a few years each.
Emma Lindley presented the work they've done in the UK Identity Assurance Program (IDAP) with social security entitlements recipients. These are not always the first types of users we think of for sophisticated IDAM functions, but in Britain, local councils see enormous efficiency dividends from speeding up the issuance of eg disabled parking permits, not to mention reducing imposters, which cost money and lead to so much resentment of the well deserved. Emma said one Attributes Exchange beta project reduced the time taken to get a 'Blue Badge' permit from 10 days to 10 minutes. She went on to describe the new "Digital Sources of Trust" initiative which promises to reconnect under-banked and under-documented sections of society with mainstream financial services. Emma told me the much-abused word "transformational" really does apply here.
MyPOV: The Digital Divide is an important issue for me, and I love to see leading edge IDAM technologies and business processes being used to do something about it -- and relatively quickly.
Then Andre Boysen of SecureKey led a discussion of the Canadian identity ecosystem, which he said has stabilised nicely around four players: Federal Government, Provincial Govt, Banks and Carriers. Lots of operations and infrastructure precedents from the payments industry have carried over.
Andre calls the smart driver license of British Columbia the convergence of "street identity and digital identity".
MyPOV: That's great news - and yet comparable jurisdictions like Australia and the USA still struggle to join governments and banks and carriers in an effective identity synthesis without creating great privacy and commercial anxieties. All three cultures are similarly allergic to identity cards, but only in Canada have they managed to supplement drivers licenses with digital identities with relatively high community acceptance. In nearly a decade, Australia has been at a standstill in its national understanding of smartcards and privacy.
For mine, the CIS Quote of the Day came from Scott Rice of the Open ID Foundation. We all know the stark problem in our industry of the under-representation of Relying Parties in the grand federated identity projects. IdPs and carriers so dominate IDAM. Scott asked us to imagine a situation where "The auto industry was driven by steel makers". Governments wouldn't put up with that for long.
Can someone give us the figures? I wonder if Identity and Access Management is already more economically ore important than cars?!
Cheers from Monterey, Day 1.
For the past year, oncologists at the Memorial Sloan Kettering Cancer Centre in New York have been training IBM’s Watson – the artificial intelligence tour-de-force that beat allcomers on Jeopardy – to help personalise cancer care. The Centre explains that "combining [their] expertise with the analytical speed of IBM Watson, the tool has the potential to transform how doctors provide individualized cancer treatment plans and to help improve patient outcomes". Others are speculating already that Watson could "soon be the best doctor in the world".
I have no doubt that when Watson and things like it are available online to doctors worldwide, we will see overall improvements in healthcare outcomes, especially in parts of the world now under-serviced by medical specialists [having said that, the value of diagnosing cancer in poor developing nations is questionable if they cannot go on to treat it]. As with Google's self-driving car, we will probably get significant gains eventually, averaged across the population, from replacing humans with machines. Yet some of the foibles of computing are not well known and I think they will lead to surprises.
For all the wondrous gains made in Artificial Intelligence, where Watson now is the state-of-the art, A.I. remains algorithmic, and for that, it has inherent limitations that don't get enough attention. Computer scientists and mathematicians have know for generations that some surprisingly straightforward problems have no algorithmic solution. That is, some tasks cannot be accomplished by any universal step-by-step codified procedure. Examples include the Halting Problem and the Travelling Salesperson Problem. If these simple challenges have no algorithm, we need be more sober in our expectations of computerised intelligence.
A key limitation of any programmed algorithm is that it must make its decisions using a fixed set of inputs that are known and fully characterised (by the programmer) at design time. If you spring an unexpected input on any computer, it can fail, and yet that's what life is all about -- surprises. No mathematician seriously claims that what humans do is somehow magic; most believe we are computers made of meat. Nevertheless, when paradoxes like the Halting Problem abound, we can be sure that computing and cognition are not what they seem. We should hope these conundrums are better understood before putting too much faith in computers doing deep human work.
And yet, predictably, futurists are jumping ahead to imagine "Watson apps" in which patients access the supercomputer for themselves. Even if there were reliable algorithms for doctoring, I reckon the "Watson app" is a giant step, because of the complex way the patient's conditions are assessed and data is gathered for the diagnosis. That is, the taking of the medical history.
In these days of billion dollar investments in electronic health records (EHRs), we tend to think that medical decisions are all about the data. When politicians announce EHR programs they often boast that patients won't have to go through the rigmarole of giving their history over and over again to multiple doctors as they move through an episode of care. This is actually a serious misunderstanding of the importance in clinical decision-making of the interaction between medico and patient when the history is taken. It's subtle. The things a patient chooses to tell, the things they seem to be hiding, and the questions that make them anxious, all guide an experienced medico when taking a history, and provide extra cues (metadata if you will) about the patient’s condition.
Now, Watson may well have the ability to navigate this complexity and conduct a very sophisticated Q&A. It will certainly have a vastly bigger and more reliable memory of cases than any doctor, and with that it can steer a dynamic patient questionnaire. But will Watson be good enough to be made available direct to patients through an app, with no expert human mediation? Or will a host of new input errors result from patients typing their answers into a smart phone or speaking into a microphone, without any face-to-face subtlety (let alone human warmth)? It was true of mainframes and it’s just as true of the best A.I.: Bulldust in, bulldust out.
Finally, Watson's existing linguistic limitations are not to be underestimated. It is surely not trivial that Watson struggles with puns and humour. Futurist Mark Pesce when discussing Watson remarked in passing that scientists don’t understand the "quirks of language and intelligence" that create humour. The question of what makes us laugh does in fact occupy some of the finest minds in cognitive and social science. So we are a long way from being able to mechanise humour. And this matters because for the foreseeable future, it puts a great deal of social intercourse beyond AI's reach.
In between the extremes of laugh-out-loud comedy and a doctor’s dry written notes lies a spectrum of expressive subtleties, like a blush, an uncomfortable laugh, shame, and the humiliation that goes with some patients’ lived experience of illness. Watson may understand the English language, but does it understand people?
Watson can answer questions, but good doctors ask a lot of questions too. When will this amazing computer be able to hold the sort of two-way conversation that we would call a decent "bedside manner"?
Have a disruptive technology implementation story? Get recognised for your leadership. Apply for the 2014 SuperNova Awards for leaders in disruptive technology.
Multi-disciplined healthcare is standard practice today. Yet an important legal precedent to do with information sharing shows how important it is that practitioners do not presuppose how patients weigh health outcomes relative to privacy. As debate continues over opt-in and opt-out models for Patient Controlled Electronic Health Records, the lessons of this case should be re-visited, because it was sympathetic to a patient's right to withhold certain information from their carers for privacy reasons.
In 2004, an oncology patient KJ was being treated at a hospital west of Sydney by a multi-disciplined care team. At one point she consulted with a psychiatrist. Sometime later, notes of her psychiatric sessions were shared with others in the oncology team. KJ objected and complained to the NSW Administrative Decisions Tribunal that her privacy had been violated. Hospital management defended the sharing on the basis that it was normal in modern multi-disciplined healthcare and that it therefore represented reasonable Use of personal information under privacy legislation. However, the tribunal agreed with KJ that she should have been informed in advance that her psychiatric file would be shared with others. That is, the tribunal found that sharing patient information even with other professionals in the same facility constituted Disclosure of Personal Information and not just Use.
In broad terms, under Australian privacy laws, the Disclosure of Sensitive Personal Information generally requires the consent of the individual concerned, whereas Use does not, because it is related to the primary purpose for collection and would be regarded as reasonable by the individual concerned.
There is no argument that the exchange of health information with colleagues caring for the same patient is inherent to most good medical practice. Sharing information would probably be universally regarded by healthcare providers, in the context of privacy legislation, as a reasonable use, closely related to the primary purpose of collecting that information. And yet KJ v Wentworth Area Health Service recognises that the attitudes of patients as to what is reasonable may differ from those of doctors. If there is a significant risk that a given patient would not think it reasonable for information to be shared, then privacy legislation in Australia (as typified by NSW law) requires that their express consent is sought beforehand.
Many healthcare facilities in NSW responded to this case by improving their Privacy Notices. At the time of admission (and hopefully also at other times during their treatment journey) patients should be informed that their Personal Information may be disclosed to other healthcare professionals in the facility. This gives the patient the opportunity to withhold details they do not want disclosed more widely.
The tribunal noted in KJ v Wentworth Area Health Service that "while generally speaking the expression 'disclosure' refers to making personal information available to people outside an agency, in the case of large public sector agencies consisting of specialised units, the exchange of personal information between units may constitute disclosure".
In other words, lay people may perceive there to be greater "distance" between different units in the health system, even within the one hospital, than do healthcare professionals. Legally, it appears that the understandable interests of healthcare professionals to work closely together do not trump a patient's wishes to sometimes keep their Personal Information compartmentalised.
This precedent is important to the design of EHR systems, for it reminds us that the entirety of the record should not be automatically accessible by all providers. But more subtley, it also re-balances the argument often advanced by doctors that opt-in may be injurious because patients might not make the best decisions if they pick-and-choose what parts of their story to include in the EHR. Even if that clinical risk is real, the ruling in KJ vs Wentworth Area Health Service would appear to empower patients to do just that.
In my view, the resolution of this tension lies in better communication, and good faith. What matters above all in electronic health is trust and participation. We know that patients who fear for their privacy will actually decline treatment if they do not trust that their Personal Information will be safe. Whether an EHR is technically opt-in or opt-out doesn't matter in the long run if patients exercise their ultimate right to just stay away. Privacy anxieties may be especially acute around mental health, sexual assault, drug and alcohol abuse and so on. It is imperative for the public health benefits expected from e-health that patients with these sorts of conditions have faith in EHRs and do not simply drop out.
Reference: Case Note: KJ v Wentworth Area Health Service, NSWADT 84, Privacy NSW; Date of Decision: 3 May 2004