Lockstep

Mobile: +61 (0) 414 488 851
Email: swilson@lockstep.com.au

Yet another anonymity promise broken

In 2016, the Australian government released, for research purposes, an extract of public health insurance data, comprising the 30-year billing history of ten percent of the population, with medical providers and patients purportedly de-identified. Melbourne University researcher Dr Vanessa Teague and her colleagues famously found quite quickly that many of the providers were readily re-identified. The dataset was withdrawn, though not before many hundreds of copies were downloaded from the government website.

The government’s responses to the re-identification work were emphatic but sadly not positive. For one thing, legislation was written to criminalize the re-identification of ostensibly ‘anonymised’ data, which would frustrate work such as Teague’s regardless of its probative value to ongoing privacy engineering (the bill has yet to be passed). For another, the Department of Health insisted that no patient information has been compromised.

It seems less ironic than inevitable that in fact the patients’ anonymity was not to be taken as read. In follow-up work released today, Teague, with Dr Chris Culnane and Dr Ben Rubinstein, have now published a paper showing how patients in that data release may indeed be re-identified.

The ability to re-identify patients from this sort of Open Data release is frankly catastrophic. The release of imperfectly de-identified healthcare data poses real dangers to patients with socially difficult conditions. This is surely well understood. What we now need to contend with is the question of whether Open Data practices like this deliver benefits that justify the privacy risks. That’s going to be a trick debate, for the belief in data science is bordering on religious.

It beggars belief that any government official would promise "anonymity" any more. These promises just cannot be kept.

Re-identification has become a professional sport. Researchers are constantly finding artful new ways to triangulate individuals’ identities, drawing on diverse public information, ranging from genealogical databases to social media photos. But it seems that no matter how many times privacy advocates warn against these dangers, the Open Data juggernaut just rolls on. Concerns are often dismissed as academic, or being trivial compared with the supposed fruits of research conducted on census data, Medicare records and the like.

In "Health Data in an Open World (PDF)" Teague et al warn (not for the first time) that "there is a misconception that [protecting the privacy of individuals in these datasets] is either a solved problem, or an easy problem to solve” (p2). They go on to stress “there is no good solution for publishing sensitive unit-record level data that protects privacy without substantially degrading the usefulness of the data" (p3).

What is the cost-benefit of the research done on these data releases? Statisticians and data scientists say their work informs government policy, but is that really true? Let’s face it. "Evidence based policy" has become quite a joke in Western democracies. There are umpteen really big public interest issues where science and evidence are not influencing policy settings at all. So I am afraid statisticians need to be more modest about the practical importance of their findings when they mount bland “balance” arguments that the benefits outweigh the risks to privacy.

If there is a balance to be struck, then the standard way to make the calculation is a Privacy Impact Assessment (PIA). This can formally assess the risk of “de-identified” data being re-identified. And if it can be, a PIA can offer other, layered protections to protect privacy.

So where are all the PIAs?

Open Data is almost a religion. Where is the evidence that evidence-based policy making really works?

I was a scientist and I remain a whole-hearted supporter of publicly funded research. But science must be done with honest appraisal of the risks. It is high time for government officials to revisit their pat assertions of privacy and security. If the public loses confidence in the health system's privacy protection, then some people with socially problematic conditions might simply withdraw from treatment, or hold back vital details when they engage with healthcare providers. In turn, that would clearly damage the purported value of the data being collected and shared.

Big Data-driven research on massive public data sets just seems a little too easy to me. We need to discuss alternatives to massive public releases. One option is to confine research data extracts to secure virtual data rooms, and grant access only to specially authorised researchers. These people would be closely monitored and audited; they would comprise a small set of researchers; their access would be subject to legally enforceable terms & conditions.

There are compromises we all need to make in research on human beings. Let’s be scientific about science-based policy. Let’s rigorously test our faith in Open Data, and let’s please stop taking “de-identification” for granted. It’s really something of a magic spell.

Posted in Big Data, Government, Privacy

The myth of the informed Internet user

Yet another Facebook ‘People You May Know’ scandal broke recently when a sex worker found that the social network was linking her clients to her “real identity”. Kashmir Hill reported the episode for Gizmodo.

This type of thing has happened before. In 2012, a bigamist was outed when his two wives were sent friend-suggestions. In 2016, Facebook introduced a psychiatrists’ patients to each other (Kash Hill again). I actually predicted that scenario back in 2010, in a letter to the British Medical Journal.

Facebook’s self-serving philosophy that there should be no friction and no secrets online has created this slippery slope, where the most tenuous links between people are presumed by the company to give it license to join things up. But note carefully that exposing ‘People You May Know’ (PYMK) is the tip of the iceberg; the chilling thing is that Facebook’s Big Data algorithms will be making myriad connections behind the scenes, long before it gets around to making introductions. Facebook is dedicated to the covert refining of all the things it knows about us, in an undying effort to value-add its information assets.

It’s been long understood that Facebook has no consent to make these linkages. I wrote about the problem in a chapter of the 2013 Encyclopedia of Social Network Analysis and Mining (recently updated): “The import of a user’s contacts and use for suggesting friends represent a secondary use of Personal Information of third parties who may not even be Facebook members themselves and are not given any notice much less the opportunity to expressly consent to the collection.” Relatedly, Facebook also goes too far when it makes photo tag suggestions, by running its biometric face recognition algorithms in the background, a practice outlawed by European privacy authorities.

We can generalise this issue, from the simple mining of contact lists, to the much more subtle collection of synthetic personal data. If Facebook determines through its secret Big Data algorithms that a person X is somehow connected to member Y, then it breaches X’s privacy to “out” them. There can be enormous harm, as we’ve seen in the case of the sex worker, if someone’s secrets are needlessly exposed, especially without warning. Furthermore, note that the technical privacy breach is deeper and probably more widespread: under most privacy laws worldwide, merely making a new connection in a database synthesizes personal information about people, without cause and without consent. I’ve called this algorithmic collection and it runs counter to the Collection Limitation principle.

This latest episode serves another purpose: it exposes the lie that people online are fully aware of what they’re getting themselves into.

There’s a bargain at the heart of the social Internet, where digital companies provide fabulous and ostensibly free services in return for our personal information. When challenged about the fairness of this trade, the data barons typically claim that savvy netizens know there is no such thing as a free lunch, and are fully aware of how the data economy works.

But that’s patently not the case. The data supply chain is utterly opaque. In Kash Hill’s article, she can’t figure out how Facebook has made the connection between a user’s carefully anonymous persona and her “real life” account (and Facebook isn’t willing to explain the “more than 100 signals that go into PYMK”). If this is a mystery to Hill, then it’s way beyond the comprehension of 99% of the population.

The asymmetry in the digital economy is obvious, when the cleverest data scientists in the world are concentrated not in universities but in digital businesses (where they work on new ways to sell ads). Data is collected, synthesized, refined, traded and integrated, all behind our backs, in ever more complex, proprietary and invisible ways. If data is “the new crude oil”, then we’re surely approaching crunch time, when this vital yet explosive raw material needs better regulating.

Posted in Facebook, Privacy

Award winning blockchain paper at HIMSSAP17

David Chou, CIO at Children’s Mercy Hospital Kansas City, and I wrote a paper “How Healthy is Blockchain Technology?” for the HIMSS Asia Pacific 17 conference in Singapore last week. The paper is a critical analysis of the strategic potential for current blockchains in healthcare applications, with a pretty clear conclusion that the technology is largely misunderstood, and on close inspection, not yet a good fit for e-health.

And we were awarded Best Paper at the conference!

The paper will be available soon from the conference website. The abstract and conclusions are below, and if you’d like a copy of the full paper in the meantime, please reach out to me at Steve@ConstellationR.com.

Abstract

Blockchain captured the imagination with a basket of compelling and topical security promises. Many of its properties – decentralization, security and the oft-claimed “trust” – are highly prized in healthcare, and as a result, interest in this technology is building in the sector. But on close inspection, first generation blockchain technology is not a solid fit for e-health. Born out of the anti-establishment cryptocurrency movement, public blockchains remove ‘people’ and ‘process’ from certain types of transactions, but their properties degrade or become questionable in regulated settings where people and process are realities. Having inspired a new wave of innovation, blockchain technology needs significant work before it addresses the broad needs of the health sector. This paper recaps what blockchain was for, what it does, and how it is evolving to suit non-payments use cases. We critically review a number of recent blockchain healthcare proposals, selected by a US Department of Health and Human Services innovation competition, and dissect the problems they are trying to solve.

Discussion

When considering whether first generation blockchain algorithms have a place in e-health, we should bear in mind what they were designed for and why. Bitcoin and Ethereum are intrinsically political and libertarian; their outright rejection of central authority is a luxury only possible in the rarefied world of cryptocurrency but is simply not rational in real world healthcare, where accountability, credentialing and oversight are essentials.

Despite its ability to transact and protect pure “math-based money”, it is a mistake to think public blockchains create trust, much less that they might disrupt existing trust relationships and authority structures in healthcare. Blockchain was designed on an assumption that participants in a digital currency would not trust each other, nor want to know anything about each other (except for a wallet address). On its own, blockchain does not support any other real world data management.

The newer Synchronous Ledger Technologies – including R3 Corda, Microsoft’s Blockchain as a Service, Hyperledger Fabric and IBM’s High Security Blockchain Network – are driven by deep analysis of the strengths and weaknesses of blockchain, and then re-engineering architectures to deliver similar benefits in use cases more complex and more nuanced than lawless e-cash. The newer applications involve orchestration of data streams being contributed by multiple parties (often in “coopetition”) with no one leader or umpire. Like the original blockchain, these ledgers are much more than storage media; their main benefit is that they create agreement about certain states of the data. In healthcare, this consensus might be around the order of events in a clinical trial, the consent granted by patients to various data users, or the legitimacy of serial numbers in the pharmaceuticals supply chain.

Conclusion

We hope healthcare architects, strategic planners and CISOs will carefully evaluate how blockchain technologies across what is now a spectrum of solutions apply in their organizations, and understand the work entailed to bring solutions into production.
Blockchain is no silver bullet for the challenges in e-health. We find that current blockchain solutions will not dramatically change the way patient information is stored, because most people agree that personal information does not belong on blockchains. And it won’t dispel the semantic interoperability problems of e-health systems; these are outside the scope of what blockchain was designed to do.

However newer blockchain-inspired Synchronous Ledger Technologies show great potential to address nuanced security requirements in complex networks of cooperating/competing actors. The excitement around the first blockchain has been inspirational, and is giving way to earnest sector-specific R&D with benefits yet to come.

Posted in Security, Privacy, Innovation, e-health, Blockchain

Blending security and privacy

An extract from my chapter “Blending the practices of Privacy and Information Security to navigate Contemporary Data Protection Challenges” in the new book “Trans-Atlantic Data Privacy Relations as a Challenge for Democracy”, Kloza & Svantesson (editors), Intersentia, 2017.

The relationship between privacy regulators and technologists can seem increasingly fraught. A string of adverse (and sometimes counter intuitive) privacy findings against digital businesses – including the “Right to be Forgotten”, and bans on biometric-powered photo tag suggestions – have left some wondering if privacy and IT are fundamentally at odds. Technologists may be confused by these regulatory developments, and as a result, uncertain about their professional role in privacy management.

Several efforts are underway to improve technologists’ contribution to privacy. Most prominent is the “Privacy by Design” movement (PbD), while a newer discipline of ‘privacy engineering’ is also striving to emerge. A wide gap still separates the worlds of data privacy regulation and systems design. Privacy is still not often framed in a way that engineers can relate to. Instead, PbD’s pat generalisations overlook essential differences between security and privacy, and at the same time, fail to pick up on the substantive common ground, like the ‘Need to Know’ and the principle of Least Privilege.

There appears to be a systematic shortfall in the understanding that technologists and engineers collectively have of information privacy. IT professionals routinely receive privacy training now, yet time and time again, technologists seem to misinterpret basic privacy principles, for example by exploiting personal information found in the ‘public domain’ as if data privacy principles do not apply there, or by creating personal information through Big Data processes, evidently with little or no restraint.

See also ‘Google's wifi misadventure, and the gulf between IT and Privacy’, and ‘What stops Target telling you're pregnant?’.

Engaging technologists in privacy is exacerbated by the many mixed messages which circulate about privacy, its relative importance, and purported social trends towards promiscuity or what journalist Jeff Jarvis calls ‘publicness’. For decades, mass media headlines regularly announce the death of privacy. When US legal scholars Samuel Warren and Louis Brandeis developed some of the world’s first privacy jurisprudence in the 1880s, the social fabric was under threat from the new technologies of photography and the telegraph. In time, computers became the big concern. The cover of Newsweek magazine on 27 July 1970 featured a cartoon couple cowered by mainframe computers and communications technology, under the urgent upper case headline, ‘IS PRIVACY DEAD?’.Of course it’s a rhetorical question. And after a hundred years, the answer is still no.

In my new paper published as a chapter of the book “Trans-Atlantic Data Privacy Relations as a Challenge for Democracy”, I review how engineers tend collectively to regard privacy and explore how to make privacy more accessible to technologists. As a result, difficult privacy territory like social networking and Big Data may become clearer to non-lawyers, and the transatlantic compliance challenges might yield to data protection designs that are more fundamentally compatible across the digital ethos of Silicon Valley and the privacy activism of Europe.

Privacy is contentious today. There are legitimate debates about whether the information age has brought real changes to privacy norms or not. Regardless, with so much personal information leaking through breaches, accidents, or digital business practices, it’s often said that ‘the genie is out of the bottle’, meaning privacy has become hopeless. Yet in Europe and many jurisdictions, privacy rights attach to Personal Information no matter where it comes from. The threshold for data being counted as Personal Information (or equivalently in the US, ‘Personally Identifiable Information’) is low: any data about a person whose identity is readily apparent constitutes Personal Information in most places, regardless of where or how it originated, and without any reference to who might be said to ‘own’ the data. This is not obvious to engineers without legal training, who have formed a more casual understanding of what ‘private’ means. So it may strike them as paradoxical that the terms ‘public’ and ‘private’ don’t even figure in laws like Australia’s Privacy Act.

Probably the most distracting message for engineers is the well-intended suggestion ‘Privacy is not a Technology Issue’. In 2000, IBM chair Lou Gerstner was one of the first high-profile technologists to isolate privacy as a policy issue. The same trope (that such-and-such ‘is not a technology issue’) is widespread in online discourse. It usually means that multiple disciplines must be brought to bear on certain complex outcomes, such as safety, security or privacy. Unfortunately, engineers can take it to mean that privacy is covered by other departments, such as legal, and has nothing to do with technology at all.

In fact all of our traditional privacy principles are impacted by system design decisions and practices, and are therefore apt for engagement by information technologists. For instance, IT professionals are liable to think of ‘collection’ as a direct activity that solicits Personal Information, whereas under technology neutral privacy principles, indirect collection of identifiable audit logs or database backups should also count.

The most damaging thing that technologists hear about privacy could be the cynical idea that ‘Technology outpaces the Law’. While we should not underestimate how cyberspace will affect society and its many laws borne in earlier ages, in practical day-to-day terms it is the law that challenges technology, not the other way round. The claim that the law cannot keep up with technology is often a rhetorical device used to embolden developers and entrepreneurs. New technologies can make it easier to break old laws, but the legal principles in most cases still stand. If privacy is the fundamental ‘right to be let alone’, then there is nothing intrinsic to technology that supersedes that right. It turns out that technology neutral privacy laws framed over 30 years ago are powerful against very modern trespasses, like wi-fi snooping by Google and over-zealous use of biometrics by Facebook. So technology in general might only outpace policing.

We tend to sugar-coat privacy. Advocates try to reassure harried managers that ‘privacy is good for business’ but the same sort of naïve slogan only undermined the quality movement in the 1990s. In truth, what’s good for business is peculiar to each business. It is plainly the case that some businesses thrive without paying much attention to privacy, or even by mocking it.

Let’s not shrink from the reality that privacy creates tensions with other objectives of complex information systems. Engineering is all about resolving competing requirements. If we’re serious about ‘Privacy by Design’ and ‘Privacy Engineering’, we need to acknowledge the inherent tensions, and equip designers with the tools and the understanding to optimise privacy alongside all the other complexities of modern information systems.

A better appreciation of the nature Personal Information and of technology-neutral data privacy rules should help to demystify European privacy rulings on matters such as facial recognition and the Right to be Forgotten. The treatment of privacy can then be lifted from a defensive compliance exercise, to a properly balanced discussion of what organisations are seeking to get out of the data they have at their disposal.

Posted in Big Data, Biometrics, Privacy, RTBF, Social Media

A hidden message from Ed Snowden to the Identerati

The KNOW Identity Conference in Washington DC last week opened with a keynote fireside chat between tech writer Manoush Zomorodi and Edward Snowden.

Once again, the exiled security analyst gave us a balanced and nuanced view of the state of security, privacy, surveillance, government policy, and power. I have always found him to be a rock-solid voice of reason. Like most security policy analysts, Snowden sees security and privacy as symbiotic: they can be eroded together, and they must be bolstered together. When asked (inevitably) about the "security-privacy balance", Snowden rejects the premise of the question, as many of us do, but he has an interesting take, arguing that governments tend to surveil rather than secure.

The interview was timely for it gave Snowden the opportunity to comment on the "Wannacry" ransomware episode which affected so many e-health systems recently. He highlighted the tragedy that cyber weapons developed by governments keep leaking and falling into the hands of criminals. For decades, there has been an argument that cryptography is a type of “Dual-Use Technology”; like radio-isotopes, plastic explosives and supercomputers, it can be used in warfare, and thus the NSA and other security agencies try to include encryption in the "Wassenaar Arangement" of export restrictions. The so-called "Crypto Wars" policy debate is usually seen as governments seeking to stop terrorists from encrypting their communications. Even if crypto export control worked, it doesn’t address security agencies' carelessness with their own cyber weapons.

But identity was the business of the conference. What did Snowden have to say about that?

  • * Identifiers and identity are not the same thing. Identifiers are for computers but “identity is about the self”, to differentiate yourself from others.
  • * Individuals need names, tokens and cryptographic keys, to be able to express themselves online, to trade, to exchange value.
  • * “Vendors don’t need your true identity”; notwithstanding legislated KYC rules for some sectors, unique identification is rarely needed in routine business.
  • *Historically, identity has not been a component of many commercial transactions.
  • *The original Web of Trust, for establishing a level of confidence in people though mutual attestation, was “crude and could not scale”. But new “programmatic, frictionless, decentralised” techniques are possible.
  • * He thought a “cloud of verifiers” in a social fabric could be more reliable, to avoid single points of failure in identity.
  • *When pressed, Snowden said actually he was not thinking of blockchain (and that he saw blockchain as being specifically good for showing that "a certain event happened at a certain time").

Now, what are identity professionals to make of Ed Snowden’s take on all this?

For anyone who has worked in identity for years, he said nothing new, and the identerati might be tempted to skip Snowden. On the other hand, in saying nothing new, perhaps Snowden has shown that the identity problem space is fully defined.

There is a vital meta-message here.

In my view, identity professionals still spend too much time in analysis. We’re still writing new glossaries and standards. We’re still modelling. We’re still working on new “trust frameworks”. And all for what? Let’s reflect on the very ordinariness of Snowden’s account of digital identity. He’s one of the sharpest minds in security and privacy, and yet he doesn’t find anything new to say about identity. That’s surely a sign of maturity, and that it’s time to move on. We know what the problem is: What facts do we need about each other in order to deal digitally, and how do we make those facts available?

Snowden seems to think it’s not complicated a question, and I would agree with him.

Posted in Security, Privacy, Identity, Government

A critique of Privacy by Design

Or Reorientating how engineers think about privacy.

From my chapter Blending the practices of Privacy and Information Security to navigate Contemporary Data Protection Challenges in “Trans-Atlantic Data Privacy Relations as a Challenge for Democracy”, Kloza & Svantesson (editors), in press.

One of the leading efforts to inculcate privacy into engineering practice has been the “Privacy by Design” movement. Commonly abbreviated "PbD" is a set of guidelines developed in the 1990s by the then privacy commissioner of Ontario, Ann Cavoukian. The movement seeks to embed privacy “into the design specifications of technologies, business practices, and physical infrastructures”. PbD is basically the same good idea as build in security, or build in quality, because retrofitting these things too late in the design lifecycle leads to higher costs* and compromised, sub-optimal outcomes.

Privacy by Design attempts to orientate technologists to privacy with a set of simple callings:

    • 1. Proactive not Reactive; Preventative not Remedial
    • 2. Privacy as the Default Setting
    • 3. Privacy Embedded into Design
    • 4. Full Functionality – Positive-Sum, not Zero-Sum
    • 5. End-to-End Security – Full Lifecycle Protection
    • 6. Visibility and Transparency – Keep it Open
    • 7. Respect for User Privacy – Keep it User-Centric.

PbD is a well-meaning effort, and yet its language comes from a culture quite different from engineering. PbD’s maxims rework classic privacy principles without providing much that’s tangible to working systems designers.

The most problematic aspect of Privacy by Design is its idealism. Politically, PbD is partly a response to the cynicism of national security zealots and the like who tend to see privacy as quaint or threatening. Infamously, NSA security consultant Ed Giorgio was quoted in “The New Yorker” of 21 January 2008 as saying “privacy and security are a zero-sum game”. Of course most privacy advocates (including me) find that proposition truly chilling. And yet PbD’s response is frankly just too cute with its slogan that privacy is a “positive sum game”.

The truth is privacy is full of contradictions and competing interests, and we ought not sugar coat it. For starters, the Collection Limitation principle – which I take to be the cornerstone of privacy – can contradict the security or legal instinct to always retain as much data as possible, in case it proves useful one day. Disclosure Limitation can conflict with usability, because Personal Information may become siloed for privacy’s sake and less freely available to other applications. And above all, Use Limitation can restrict the revenue opportunities that digital entrepreneurs might otherwise see in all the raw material they are privileged to have gathered.

Now, by highlighting these tensions, I do not for a moment suggest that arbitrary interests should override privacy. But I do say it is naive to flatly assert that privacy can be maximised along with any other system objective. It is better that IT designers be made aware of the many trade-offs that privacy can entail, and that they be equipped to deal with real world compromises implied by privacy just as they do with other design requirements. For this is what engineering is all about: resolving conflicting requirements in real world systems.

So a more sophisticated approach than “Privacy by Design” is privacy engineering in which privacy can take its place within information systems design alongside all the other practical considerations that IT professionals weigh up everyday, including usability, security, efficiency, profitability, and cost.

See also my "Getting Started Guide: Privacy Engineering" from Constellation Research.

      • *Footnote
      • Not unrelatedly, I wonder if we should re-examine the claim that retrofitting privacy, security and/or quality after a system has been designed and realised leads to greater cost! Cold hard experience might suggest otherwise. Clearly, a great many organisations persist with bolting on these sorts of features late in the day -- or else advocates wouldn't have to keep telling them not to. And the Minimum Viable Product movement is almost a license to defer quality and other non-essential considerations. All businesses are cost conscious, right? So averaged across a great many projects over the long term, could it be that businesses have in fact settled on the most cost effective timing of security engineering, and it's not as politically correct as we'd like?!

Posted in Software engineering, Privacy, Innovation

Blockchain, Healthcare and the Bleeding Edge of R&D

Last month, over September 26-27, I attended a US government workshop on The Use of Blockchain in Healthcare and Research, organised by the Department of Health & Human Services Office of the National Coordinator (ONC) and hosted at NIST headquarters at Gaithersburg, Maryland. The workshop showcased a number of winning entries from ONC's Blockchain Challenge, and brought together a number of experts and practitioners from NIST and the Department of Homeland Security.

I presented an invited paper "Blockchain's Challenges in Real Life" (PDF) alongside other new research by Mance Harmon from Ping Identity, and Drummond Reed from Respect Network. All the workshop presentations, the Blockchain Challenge winners' papers and a number of the unsuccessful submissions are available on the ONC website. You will find contributions from major computer companies and consultancies, leading medical schools and universities, and a number of unaffiliated researchers.

I also sat on a panel session about identity innovation, joining entrepreneurs from Digital Bazaar, Factom, Respect Network, and XCELERATE, all of which are conducting R&D projects funded by the DHS Science and Technology division.

Around the same time as the workshop, I happened to finalise two new Constellation Research papers, on security and R&D practices for blockchain technologies. And that was timely, because I am afraid that once again, I have immersed myself in some of the most current blockchain thinking, only to find that key pieces of the puzzle are still missing.

Disclosure: I traveled to the Blockchain in Healthcare workshop as a guest of ONC, which paid for my transport and accommodation.

Three observations from the Workshop

There were two things I just did not get as I read the winning Blockchain Challenge papers and listened to the presentations. And I observe that there is one crucial element that most of the proposals are missing

Firstly, one of the most common themes across all of the papers was interoperability. A great challenge in e-health is indeed interoperability. Disparate health systems speak different languages, using different codes for the same medical procedures. Adoption of new standard terminologies and messaging standards, like HL-7 and ICD, is infamously slow, often taking a decade or longer. Large clinical systems are notoriously complex to implement, so along the way they invariably undergo major customisation, which makes each installation peculiar to its setting, and resistant to interfacing with other systems.

In the USA, Health Information Exchanges (HIEs) have been a common response to these problems, the idea being that an intermediary switching system can broker understanding between local e-health programs. But as anyone in the industry knows, HIEs have been easier said than done, to say the least.

According to many of the ONC challenge papers, blockchain is supposed to bring a breakthrough, yet no one has explained how a ledger will make the semantics of all these e-health silos suddenly compatible. Blockchain is a very specific protocol that addresses the order of entries in a distributed ledger, to prevent Double Spend without an administrator. Nothing about blockchain's fundamentals relates to the contents of messages, healthcare semantics, medical codes and so on. It just doesn't "do" interoperability! The complexity in healthcare is intrinsic to the subject matter; it cannot be willed away with any new storage technology.

The second thing I just didn't get about the workshop was the idea that blockchain will fix healthcare information silos. Several speakers stressed the problem that data is fragmented, concentrated in local repositories, and hard to find when needed. All true, but I don't see what blockchain can do about this. A consensus was reached at the workshop that personal information and Protected Health Information (PHI) should not be stored on the blockchain in any significant amounts (not just because of its sensitivity but also the sheer volume of electronic health records and images in particular). So if we're agreed that the blockchain could only hold pointers to health data, what difference can it make to the current complex of record systems?

Accenture Blockchain for Healthcare CURRENT STATE
Accenture Blockchain for Healthcare TARGET STATE


And my third problem at the workshop was the stark omission of key management. This is the central administrative challenge in any security system, of getting the right cryptographic keys and credentials into the right hands, so all parties can be sure who they are dealing with. The thing about blockchain is that it did away with key management. The genius of the original Bitcoin blockchain is it allows people to exchange guaranteed value without needing to know anything about each other. Blockchain actually dispenses with key management and it may be unique in the history of security for doing so (see also Blockchain has no meaning). But when we do need to know who's who in a health system – to be certain when various users really are authorised medicos, researchers, insurers or patients – then key management must return to the mix. And then things get complicated, much more complicated than the utopian setting of Bitcoin.

Moreover, healthcare is hierarchical. Inherent to the system are management structures, authorizations, credentialing bodies, quality assurance and audits – all the things that blockchain's creator Satoshi Nakamoto expressly tried to get rid of. As I explained in my workshop speech, if a blockchain deployment still has to involve third parties, then the benefits of the algorithm are lost. So said Nakamoto him/herself!

Steve Wilson Blockchain and Healthcare NIST ONC 26Sep16 (0 6)  THird Party

In my view, most blockchain for healthcare projects will discover, sooner or later, than once the necessary key management arrangements are taken care of, their choice of distributed ledger technology becomes inconsequential.

New Constellation Research on Blockchain Technologies

How to Secure Blockchain Technologies

Security for blockchains and Distributed Ledger Technologies (DLTs) have evolved quickly. As soon as interest in blockchain grew past crypto-currency into mainstream business applications, it became apparent that the core ledger would need to augmented with permissions for access control, and encryption for confidentiality. But what few people appreciate is that these measures conflict with the rationale of the original blockchain algorithm, which was expressly meant to dispel administration layers. The first of my new papers looks at these tensions, what they mean for public and private blockchain systems, paints a picture for third generation DLTs.

How to Conduct Effective Blockchain R&D

The uncomfortable marriage of ad hoc security and the early blockchain is indicative of a broader problem I've written about many times: too much blockchain "innovation" is proceeding with insufficient rigor. Which brings us to the second of my new papers. In the rush to apply blockchain to broader payments and real world assets, few entrepreneurs have been clear and precise about the problems they think they’re solving. If the R&D is not properly grounded, then the resulting solutions will be weak and will ultimately fail in the market. It must be appreciated that the original blockchain was only a prototype. Great care needs to be taken to learn from it and more rigorously adapt fast-evolving DLTs to enterprise needs.

Constellation ShortListTM for Distributed Ledger Technologies Labs

Finally, Constellation Research has launched a new product, the Constellation ShortListTM. These are punchy lists by our analysts of leading technologies in dozens of different categories, which will each be refreshed on a short cycle. The objective is to help buyers of technology when choosing offerings in new areas.

My Constellation ShortListTM for blockchain-related solution providers is now available here.

Posted in Security, Privacy, e-health, Constellation Research, Blockchain

Personal Information: What's it all 'about'?

For the past few years, a crucial case has been playing out in Australia's legal system over the treatment of metadata in privacy law. The next stanza is due to be written soon in the Federal Court.

It all began when a journalist with a keen interest in surveillance, Ben Grubb, wanted to understand the breadth and depth of metadata, and so requested that mobile network operator Telstra provide him a copy of his call records. Grubb thought to exercise his rights to access Personal Information under the Privacy Act. Telstra held back a lot of Grubb's call data, arguing that metadata is not Personal Information and is not subject to the access principle. Grubb appealed to the Australian Privacy Commissioner, who ruled that metadata is identifiable and hence represents Personal Information. Telstra took their case to the Administrative Appeals Tribunal, which found in favor of Telstra, with a surprising interpretation of "Personal Information". And the Commissioner then appealed to the next legal authority up the line.

At yesterday's launch of Privacy Awareness Week in Sydney, the Privacy Commissioner Timothy Pilgrim informed us that the full bench of the Federal Court is due to consider the case in August. This could be significant for data privacy law worldwide, for it all goes to the reach of these sorts of regulations.

I always thought the nuance in Personal Information was in the question of "identifiability" -- which could be contested case by case -- and those good old ambiguous legal modifiers like 'reasonably' or 'readily'. So it was a great surprise that the Administrative Appeals Tribunal, in overruling the Privacy Commissioner in Ben Grubb v Telstra, was exercised instead by the meaning of the word "about".

Recall that the Privacy Act (as amended in 2012) defines Personal Information as:

    • "Information or an opinion about an identified individual, or an individual who is reasonably identifiable: (a) whether the information or opinion is true or not; and (b) whether the information or opinion is recorded in a material form or not."

The original question at the heart of Grubb vs Telstra was whether mobile phone call metadata falls under this definition. Commissioner Pilgrim showed that call metadata is identifiable to the caller (especially identifiable by the phone company itself that keeps extensive records linking metadata to customer records) and therefore counts as Personal Information.

When it reviewed the case, the tribunal agreed with Pilgrim that the metadata was identifiable, but in a surprise twist, found that the metadata is not actually about Ben Grubb but instead is about the services provided to him.

    • Once his call or message was transmitted from the first cell that received it from his mobile device, the [metadata] that was generated was directed to delivering the call or message to its intended recipient. That data is no longer about Mr Grubb or the fact that he made a call or sent a message or about the number or address to which he sent it. It is not about the content of the call or the message ... It is information about the service it provides to Mr Grubb but not about him. See AATA 991 (18 December 2015) paragraph 112.

To me it's passing strange that information about calls made by a person is not also regarded as being about that person. Can information not be about more than one thing, namely about a customer's services and the customer?

Think about what metadata can be used for, and how broadly-framed privacy laws are meant to stem abuse. If Ben Grubb was found, for example, to have repeatedly called the same Indian takeaway shop, would we not infer something about him and his taste for Indian food? Even if he called the takeaway shop just once, we might still conclude something about him, even if the sample size is small. We might deduce he doesn't like Indian (remember that in Australian law, Personal Information doesn't necessarily have to be correct).

By the AAT's logic, a doctor's appointment book would not represent any Personal Information about her patients but only information about the services she has delivered to them. But in fact the appointment list of an oncologist for instance would tell us a lot about peoples' cancer.

Given the many ways that metadata can invade our privacy (not to mention that people may be killed based on metadata) it's important that the definition of Personal Information be broad, and that it has a low threshold. Any amount of metadata tells us something about the person.

I appreciate that the 'spirit of the law' is not always what matters, but let's compare the definition of Personal Information in Australia with corresponding concepts elsewhere (see more detail beneath). In the USA, Personally Identifiable Information is any data that may "distinguish" an individual; in the UK, Personal Data is anything that "relates" to an individual; in Germany, it is anything "concerning" someone. Clearly the intent is consistent worldwide. If data can be linked to a person, then it comes under data privacy law.

Which is how it should be. Technology neutral privacy law is framed broadly in the interests of consumer protection. I hope the Federal Court in drilling into the definition of Personal Information upholds what the Privacy Act is for.

Personal Information definitions around the world.

Personal Information, Personal Data and Personally Identifiable Information are variously and more or less equivalently defined as follows (references are hyperlinked in the names of each country):

United Kingdom

    • data which relate to a living individual who can be identified

Germany

    • any information concerning the personal or material circumstances of an identified or identifiable individual

Canada

    • information about an identifiable individual

United States

    • information which can be used to distinguish or trace an individual's identity ...

Australia

    • information or an opinion ... about an identified individual, or an individual who is reasonably identifiable.

Posted in Privacy

Uniquely difficult

I was talking with government identity strategists earlier this week. We were circling (yet again) definitions of identity and attributes, and revisiting the reasonable idea that digital identities are "unique in a context". Regular readers will know I'm very interested in context. But in the same session we were discussing the public's understandable anxiety about national ID schemes. And I had a little epiphany that the word "unique" and the very idea of it may be unhelpful.

The association of uniqueness with the troubling idea of national identity is not just perception; there is a real tendency for identity and access management (IDAM) systems to over-identify, with an obvious privacy penalty. Security pros tend to feel instinctively that the more they know about people, the more secure we all will be.

Whenever we think "uniqueness" is important, I wonder if there are really other more precise objectives that apply? Is "singularity" a better word for the property we're looking for? Or the mouthful "non-ambiguity"? In different use cases, what we really need to know can vary:

  • Is the person (or entity) accessing service the same as last time?
  • Is the person exercising a credential clear to use it? Delegation of digital identity means one entity can act for several others, complicating "uniqueness"
  • Does the Relying Party (RP) know the user well enough for the RP's purposes? That doesn't always mean uniquely.

I observe that when IDAM schemes come loaded with reference to uniqueness, it tends to bias the way RPs do their identification and risk management designs. There can arise an expectation that uniqueness is important, no matter what. Yet a great deal of fraud exploits weaknesses at transaction time, not enrollment time: no matter if you are identified uniquely, you can still get defrauded by an attacker who takes over or bypasses your authenticator. So uniqueness in and of itself doesn't always help.

If people do want to use the word "unique" then they should have the discipline to always qualify it, as mentioned, as "unique in a context".

Finally it's worth remembering that the word has long been degraded by the biometrics industry with their habit of calling most any biological trait "unique". There's a sad lack of precision here. No biometric as measured is ever unique! Every mode, even the much vaunted iris, has a non zero False Match Rate.

What's in a word? A lot! I'd like to see more rigorous use of the word "unique". At least let's be aware of what it means subliminally. With the word bandied around so much, engineers can tend to think uniqueness is always a designed objective, and laypeople can presume that every authentication scheme is out to fingerprint them. Literally.

Posted in Privacy, Identity, Government, Biometrics, Security

The last thing privacy needs is new laws

World Wide Web inventor Sir Tim Berners-Lee has given a speech in London, re-affirming the importance of privacy, but unfortunately he has muddied the waters by casting aspersions on privacy law. Berners-Lee makes a technologist's error, calling for unworkable new privacy mechanisms where none in fact are warranted.

The Telegraph reports Berners-Lee as saying "Some people say privacy is dead – get over it. I don't agree with that. The idea that privacy is dead is hopeless and sad." He highlighted that peoples' participation in potentially beneficial programs like e-health is hampered by a lack of trust, and a sense that spying online is constant.

Of course he's right about that. Yet he seems to underestimate the data privacy protections we already have. Instead he envisions "a world in which I have control of my data. I can sell it to you and we can negotiate a price, but more importantly I will have legal ownership of all the data about me" he said according to The Telegraph.

It's a classic case of being careful what you ask for, in case you get it. What would control over "all data about you" look like? Most of the data about us these days - most of the personal data, aka Personally Identifiable Information (PII) - is collected or created behind our backs, by increasingly sophisticated algorithms. Now, people certainly don't know enough about these processes in general, and in too few cases are they given a proper opportunity to opt in to Big Data processes. Better notice and consent mechanisms are needed for sure, but I don't see that ownership could fix a privacy problem.

What could "ownership" of data even mean? If personal information has been gathered by a business process, or created by clever proprietary algorithms, we get into obvious debates over intellectual property. Look at medical records: in Australia and I suspect elsewhere, it is understood that doctors legally own the medical records about a patient, but that patients have rights to access the contents. The interpretation of medical tests is regarded as the intellectual property of the healthcare professional.

The philosophical and legal quandries are many. With data that is only potentially identifiable, at what point would ownership flip from the data's creator to the individual to whom it applies? What if data applies to more than one person, as in household electricity records, or, more seriously, DNA?

What really matters is preventing the exploitation of people through data about them. Privacy (or, strictly speaking, data protection) is fundamentally about restraint. When an organisation knows you, they should be restrained in what they can do with that knowledge, and not use it against your interests. And thus, in over 100 countries, we see legislated privacy principles which require that organisations only collect the PII they really need for stated purposes, that PII collected for one reason not be re-purposed for others, that people are made reasonably aware of what's going on with their PII, and so on.

Berners-Lee alluded to the privacy threats of Big Data, and he's absolutely right. But I point out that existing privacy law can substantially deal with Big Data. It's not necessary to make new and novel laws about data ownership. When an algorithm works out something about you, such as your risk of developing diabetes, without you having to fill out a questionnaire, then that process has collected PII, albeit indirectly. Technology-neutral privacy laws don't care about the method of collection or creation of PII. Synthetic personal data, collected as it were algorithmically, is treated by the law in the same way as data gathered overtly. An example of this principle is found in the successful European legal action against Facebook for automatic tag suggestions, in which biometric facial recognition algorithms identify people in photos without consent.

Technologists often under-estimate the powers of existing broadly framed privacy laws, doubtless because technology neutrality is not their regular stance. It is perhaps surprising, yet gratifying, that conventional privacy laws treat new technologies like Big Data and the Internet of Things as merely potential new sources of personal information. If brand new algorithms give businesses the power to read the minds of shoppers or social network users, then those businesses are limited in law as to what they can do with that information, just as if they had collected it in person. Which is surely what regular people expect.

Posted in Privacy, e-health, Big Data