Lockstep

Mobile: +61 (0) 414 488 851
Email: swilson@lockstep.com.au

Building an orderly Digital Economy

Two new Constellation Research reports of mine cover data protection, looking at the heightened need for privacy in the face of artificial intelligence and Big Data, and the sorts of systemic infrastructure needed to safeguard data supply in future.

Big Data and AI are infamously providing corporations and governments with the means to know us "better than we know ourselves". Businesses no longer need to survey their customers to work out their product preferences, lifestyles, or even their state of health; instead, data analytics and machine learning algorithms, fueled by vast amounts of the "digital exhaust" we leave behind wherever we go online, are uncovering ever deeper insights about us. Businesses get to know us now automatically, without ever asking explicit questions.

What are the privacy implications? The good news for consumers and privacy advocates is that general data protection and privacy laws are technology neutral; they extend essentially the same protections to Personal Data that is automatically generated as they do data collected manually. Long established privacy laws have been applied to curb the excesses of digital companies on the cutting edge of data processing. My report "Big Data – Big Privacy" examines the strengths and weaknesses of classical privacy laws.

The future of the digital economy depends on reasonable and equitable use of data as a resource. The early years of the Internet Age has seen significant exploitation of individuals by digital entrepreneurs, and stark imbalances in the riches that can be made from mining and refining information. But privacy laws are being reinforced in Europe (with the EU's General Data Protection Rule, GDPR) and extended to places like California. This is surely a sign of the law-and-order to come.

The early oil rush is instructive for how the digital economy should probably evolve from here. To bring oil safely to market, the petroleum industry organised itself into complex new supply chains, for moving and processing petrochemicals. Technical standards, enforceable rules, and even social norms (like good habits for handling gasoline) developed to help keep the new supply chains orderly.

Obviously data is quite different from oil, and the comparison isn't meant to be taken too far. So what practical lessons are there from the petrochemical experience for the future organisation of the digital economy? What would data supply chains actually look like? It seems likely to me that new laws and jurisprudence will emerge to deal with data as a intangible asset class, but that's another story. For now, I start to tease out the more technological aspects of data protection in "How Data Supply Chains Must Be Safeguarded in the Digital Economy".

I start with the challenge of how to be sure about the people and entities we try to deal with in the digital environment. The Digital Identity industry has grappled with this for two decades, and its successes can be leveraged.

In the so-called "real world", commerce and government services revolve around established facts and figures about people (account numbers, customer reference numbers, employee numbers, professional qualifications, memberships, social security entitlements, driver licenses, and personal attributes like age, residency, health conditions and so on). But all these critical pieces of information lose their reliability and provenance online: we cannot tell where the information is supposed to have come from, much less can we distinguish clones and counterfeits from "originals". Nor can we be sure that data presented online truly belongs to particular individuals.

But in another setting, this is a solved problem. The susceptibility of Digital Identity data to fraud is very similar to that of credit card numbers, and we've secured them with integrated circuits and cryptography.

A credit card is nothing more than a data carrier used to present an account holder's bona fides to a merchant (within the context of an overarching scheme). Over time, the payment card industry has steadily adopted more robust forms of data carrier:

i. The original paper charge cards in the 1950s were transcribed by merchants by hand
ii. embossed plastic cards were "read" by carbon paper click-clack machines
iii. magnetic stripe cards were read automatically by electronic terminals which scanned data encoded in analogue magnetized patterns
iv. now chip cards are also read automatically but using digital memory and mutual authentication between card and terminal
v. smart phones embody chips which can mimic smartcards, and bring added functionality, like a mobile wallet which can manage multiple accounts.

Magnetic stripe cards persisted for decades until criminal skimming and carding became unbearable. Magnetic stripe fraud is enabled because a card terminal cannot tell the difference between an original analogue stripe and a copy; the data encoded in the magnetic medium has no provenance.

The whole point of a chip or smart payment card is to protect the presentation of cardholder data, to prevent interception, tampering, illicit replay, cloning and/or counterfeiting. The cardholder data in a chip card is exactly the same as that in a magnetic stripe (or on the surface of either type of card for that matter) but the data transfer protocol from chip card to terminal is special.

A chip card holds the cardholder details within an embedded microprocessor, along with one or more private keys which are unique to each cardholder. Data is not passively transferred to the terminal as it is with a mag stripe card; instead, for each transaction, there is a handshake. First the terminal sends the purchase details into the card's microprocessor, which combines them with the cardholder data and digitally signs the combination before sending it back to the terminal. This operation renders each encoded transaction unique to the card and cardholder, and prevents substitution of stolen data or tampering with the transaction.

As the older technology is phased out, an overall systemic improvement is that raw data becomes useless, and valueless to thieves. Best practice is that no raw card details are relied upon but instead we expect transactions to employ chips and digital signatures, to ensure provenance.
The experience of progressively tackling plastic card fraud offers lessons for economy-scale digital identity, and data management in general. The core technique is digital signatures (applied and processed automatically by smart devices) underpinned by seamless key management (where cryptographic keys are registered to users for different applications). The provenance of all data could be safeguarded in the same way as credit card numbers are protected in the payments system.

In my two new reports I try to balance a positive view of classical privacy regulations, with a realistic, evidence-based vision of how standard cryptographic technology can protect the provenance of all data and systematise how it flows through the digital economy.

Posted in Security, RTBF, Privacy, Internet, Identity, Fraud, Big Data, AI

Teachable moments in AI

A truism from early computing was that computers can only do what they are programmed to do. It’s hackneyed, but today this limitation is more important than ever. Starkly, no artificial narrow intelligence can learn from its experience.

Remember the chatbot Tay - an AI experiment of Microsoft's, which was intended to learn from real people by participating in Twitter but which had to be shutdown after it went feral. Imagine Tay was a real child: her parents would have sat her down and explained how what she had been saying was wrong.

It is no trivial point that we don’t simply turn off an anti-social infant, but instead we expect them to grow from their missteps. In contrast, AI has no teachable moments. Real intelligence can’t be all artificial.

Posted in AI

Identity is dead

Identity is dead. All that matters now is data.
There are no Identity Providers, just data brokers.
Data supply and provenance are critical infrastructure for the digital economy.

For at least five years there has been a distinct push within the identity management industry towards attributes: a steady shift from who to what. It might have started at the Cloud Identity Summit in Napa Valley in 2013, where Google/PayPal/RSA veteran Andrew Nash, speaking on a panel of "iconoclasts" announced that 'attributes are more interesting than identity'. A few months earlier, the FIDO Alliance had been born. On a mission to streamline authentication, FIDO protocols modestly operate low down the technology stack and leave identification as a policy matter to be sorted out by implementers at the application level. Since 2013, we've also seen the Vectors of Trust initiative which breaks out different dimensions of authentication decision making, and a revamp of the US Federal Government Authentication Guide NIST SP 800-63, with a refinement of the coarse old Levels of Assurance. And Lockstep is developing new attribute certificate techniques.

Across cyberspace, provenance is the hottest topic. How do we know what's real online? How can we pick fake accounts, fake news, even fake videos?

Provenance in identity management is breaking out all over, with intense interest in Zero Knowledge Proofs of attributes in many Self Sovereign Identity projects, and verified claims being standardised in a W3C standards working group.

These efforts promise to reverse our long slide into complication. Identity has been over-analysed and authentication over-engineered. The more strongly we identify, the more we disclose, and the unintended consequences just keep getting worse.

Yet it doesn't have to be so. Here's what really matters:

  • What do you need to know about someone or something in order to deal with them?
  • Where will you get that knowledge?
  • How will you know it's true?

    These should be the concerns of authentication.
    It's not identity per se that matters; it's not even attributes or claims. Attributes are just data, and provenance lies in metadata.

    It has become conventional wisdom in IDAM that few transactions really need your identity. So why don't we just kill it off? Let's focus instead on what users really need to know when they transact, and work out how to deliver that knowledge as we design transaction systems.

    IDAM has been framed for years around a number of misnomers. "Digital identity" for instance is nothing like identity in real life, and "digital signatures" are very strange signatures. Despite the persistent cliché, there are no online "passports".

    But the worst idea of all is the Identity Provider, invented over a decade ago to try and create a new order in cybersecurity. It's an understandable abstraction to regard bank accounts for example as "identities" and it follows that banks can be regarded as "identity providers". But these theoretical models have proved sterile. How many banks in fact see themselves as Identity Providers? No IdPs actually emerged from well-funded programs like Identrus or the Australian Trust Centre; just one bank set up as an IdP in the GOV.UK Verify program. If Identity Providers are such a good idea, they should be widespread by now in all advanced digitizing economies.

    The truth is that Identity Providers as imagined can't deliver. Identity is in the eye of the Relying Party. The state of being identified is determined by a Relying Party once they are satisfied they know enough about a data subject to manage the risk of transacting with them. Identity is a metaphor for being in a particular relationship, defined by the Relying Party (for it is the RP that carries most of the risk if an identification is flawed). Identity is not the sort of good or service that can be provided but only conferred by Relying Parties. The metaphor is all wrong.

    Digital identity has failed to materialise, because it's a false idol.

    In the late 1990s, when critics said Quality is Dead! they didn’t mean quality doesn’t matter, but that the formalities, conventions and patterns of the Quality Movement had become counterproductive. That's what I mean about digital identity. The movement has failed us.

    We don't need to know who people are online; we need to know certain specifics about them, case by case. So let's get over "identity", and devote our energies to the critical infostructure needed to supply the reliable data and metadata essential for an orderly digital economy.

    Posted in Identity

  • What if genes aren't entirely digital?

    An unpublished letter to Nature magazine.

    Sheila Jasanoff and J. Benjamin Hurlbut, in their call for a gene editing laboratory (Nature 555, 435; 2018) stress the "tendency to fall back on the framings that those at the frontiers of research find most straightforward and digestible". My concern is that the most digestible framing of all – that genes are editable in the first place – is gravely misleading.

    The telling metaphor of genes-are-code arose in the 1950s at the coincidental dawns of genetics and computer science. Codons (combinations of DNA base pairs) map precisely onto different amino acids according to the so-called genetic code, which is nearly universal for all life on Earth. And thus, sequences of base pairs form genes, which "code" for proteins and enzymes, in what look beguilingly like programs specifying organisms. But while the low level genetic code is neat and digital, what happens further up the biochemical stack is much more analogue. Proteins and enzymes are never single purpose, and never play their roles within a body in isolation. Genes unlike computer instructions are not compartmentalised; genomes unlike computer programs are not designed one at a time, but have evolved as intricate ensembles, with selection pressures operating between bodies, and between species.

    The genes-are-code metaphor should have been re-examined as genetics evolved. The decidedly non-computer-like reality has been betrayed over time by discoveries like interactive gene expression, epigenetics, and the sobering fact that non-coding "junk DNA" is not junk after all. One wonders if applied biology was held up for decades by the simplistic presumption that DNA had to code for something in order to be functional.

    If the genome is not really digital, then "hacking" it like code will inevitably have unintended consequences. It's not just the public which needs a better understanding of genetic engineering but the mislabelled "engineers" themselves.

    April 14, 2018.

    Posted in Software engineering, Science

    Latest Card Fraud Statistics for Australia FY2017

    The Australian Payments Network (formerly the Australian Payments Clearing Association, APCA) releases http://auspaynet.com.au/resources/fraud-statistics/"card fraud statistics every six months for the preceding 12m period. For well over a decade now, Lockstep has been monitoring these figures, plotting the trend data and analysing what the industry is doing (and not doing) about Card Not Present fraud. Here is our summary for the most recent financial year 2017 stats.

    CNP trends pic to FY 2017 b

    Total card fraud went up only 3% from FY16 to FY17; Card Not Present (CNP) fraud was up 10% to $443 million, representing 86% of all fraud perpetrated on Australian payment cards.

    CNP fraud is enabled by the difficulty merchants (and merchant servers) have telling the difference between original cardholder details and stolen data. Criminals procure stolen details in enormous volumes and replay them against vulnerable shopping sites.

    A proper foundational fix to replay attack is easily within reach, which would re-use the same cryptography that solves skimming and carding, and would restore a seamless payment experience for card holders. Apple for one has grasped the nettle, and is using its Secure Element-based Apple Pay method (established now for card present NFC payments) for Card Not Present transactions, in the app.

    See also my 2012 paper Calling for a Uniform Approach to Card Fraud Offline and On" (PDF).

    Abstract

    The credit card payments system is a paragon of standardisation. No other industry has such a strong history of driving and adopting uniform technologies, infrastructure and business processes. No matter where you keep a bank account, you can use a globally branded credit card to go shopping in almost every corner of the world. The universal Four Party settlement model, and a long-standing card standard that works the same with ATMs and merchant terminals everywhere underpin seamless convenience. So with this determination to facilitate trustworthy and supremely convenient spending in every corner of the earth, it’s astonishing that the industry is still yet to standardise Internet payments. We settled on the EMV standard for in-store transactions, but online we use a wide range of confusing and largely ineffective security measures. As a result, Card Not Present (CNP) fraud is growing unchecked.

    This article argues that all card payments should be properly secured using standardised hardware. In particular, CNP transactions should use the very same EMV chip and cryptography as do card present payments.

    Posted in Payments

    The limits of artificial ethics

    The premise that machines can behave ethically

    Logicians have long known that some surprisingly simple tasks have no algorithmic solutions. That is, not everything is computable. This fact seems lost on artificial intelligence pundits who happily imagine a world of robots without limits. The assumption that self-driving cars for instance will be able to deal automatically and ethically with life-and-death situations goes unexamined in the public discourse. If algorithms have critical limits, then the tacit assumption that ethics will be automated is itself unethical. Machine thinking is already failing in unnerving ways. The public and our institutions need to be better informed about the limits of algorithms.
    Is it ethical to presume ethics will automate?

    Discussions of AI and ethics bring up a familiar suite of recurring questions. How do the biases of programmers infect the algorithms they create? What duty of care is owed to human workers displaced by robots? Should a self-driving car prioritise the safety of its occupants over that of other drivers and pedestrians? How should a battlefield robot kill?

    These posers seem to stake out the philosophical ground for various debates which policy makers presume will eventually yield morally “balanced” positions. But what if the ground is shaky? There are grave risks in accepting the premise that ethics may be computerized.

    The limits of computing

    To recap, an algorithm is a repeatable set of instructions, like a recipe or a computer program. Algorithms run like clockwork; for the same set of inputs (including history), an algorithmic procedure always produces the same result. Crucially, every algorithm has a fixed set of inputs, all of which are known at the time it is designed. Novel inputs cannot be accommodated without a (human) programmer leaving the program and going back to the drawing board. No program can be programmed to reprogram itself.

    No algorithm is ever taken by surprise, in the way a human can recognise the unexpected. A computer can be programmed to fail safe (hopefully) if some input value exceeds a design limit, but logically, it cannot know what to do in circumstances the designers did not foresee. Algorithms cannot think (or indeed do anything at all) “outside the dots”.

    Worse still, some problems are simply not computable. For instance, logicians know for certain that the Halting Problem – that is, working out if a given computer program is ever going to stop – has no general algorithmic solution.

    This is not to say some problems can’t be solved at all; rather, some problems cannot be treated with a single routine. In other words, problem solving often entails unanticipated twists and turns. And we know this of course from everyday life. Human affairs are characteristically unpredictable. Take the law. Court cases are not deterministic; in the more “interesting” trials, the very best legal minds are unable to pick the outcomes. If it were otherwise, we wouldn’t need courts (and we would be denied the inexhaustible cultural staple of the courtroom drama). The main thing that makes the law non-algorithmic is unexpected inputs (that is, legal precedents).

    How will we cope when AI goes wrong?

    It is chilling when a computer goes wrong (and hence we have that other rich genre, the robot horror story). Lay people and technologists alike are ill-prepared for the novel failure modes that will come with AI. At least one death has already resulted from an algorithm mishandling an unexpected scenario (in the case of a Tesla in automatic mode misinterpreting bright sunshine and then failing to notice a truck). Less dramatic but equally unnerving anomalies have been seen in “racist” object classification and face recognition systems.

    Even if robots bring great net benefits to human well-being – for instance through the road toll improvements predicted for self-driving cars – society may not be ready for the exceptions. Historically, product liability cases have turned on forensic evidence of what went wrong, and finding points in the R&D or manufacturing processes where people should have known better. But algorithms are already so complicated that individuals can’t fairly be called “negligent” for missing a flaw (unless the negligence is to place too much faith in algorithms in general). At the same time, almost by design, AIs are becoming inscrutable, while they grow unpredictable. Neural networks for example don’t generally keep diagnostic execution traces like conventional software.

    We bring this uncertainty on ourselves. The promise of neural networks is that they replicate some of our own powers of intuition, but artificial brains have none of the self-awareness we take for granted in human decision making. After a failure, we may not be able to replay what a machine vision system saw, much less ask: What were you thinking?

    Towards a more modest AI

    If we know that no computer on its own can solve something as simple as the Halting Problem, there should be grave doubts that robots can take on hard tasks like driving. Algorithms will always, at some point, be caught short in the real world. So how are we to compensate for their inadequacies? And how are community expectations to be better matched to the technical reality.

    Amongst other things, we need to catalogue the circumstances in which robots may need to hand control back to a human (noting that any algorithm for detecting algorithmic failure will itself fail at some point!). The ability to interrogate anomalous neural networks deserves special attention, so that adaptive networks do not erase precious evidence. And just as early neurology proceeded largely by studying rare cases of brain damage, we should systematise the study of AI failures, and perhaps forge a pathology of artificial intelligence.

    Posted in Software engineering, AI

    Latest Card Not Present Fraud Stats - Australia

    The Australian Payments Network (formerly the Australian Payments Clearing Association, APCA) releases http://www.apca.com.au/payment-statistics/fraud-statistics"card fraud statistics every six months for the preceding 12m period. For over a decade now, Lockstep has been monitoring these figures, plotting the trend data and analysing what the industry is doing - and not doing - about Card Not Present fraud. Here is our summary for the most recent calendar year 2016 stats.

    CNP trends pic to CY 2016

    Total card fraud climbed another 17% from 2015 to 2016; Card Not Present (CNP) fraud was up 15% to $417 million, representing 82% of all card fraud.

    CNP fraud is enabled by the difficulty merchants (and merchant servers) have telling the difference between original cardholder details and stolen data. Criminals procure stolen details in enormous volumes and replay them against vulnerable shopping sites.

    A proper foundational fix to replay attack is easily within reach, which would re-use the same cryptography that solves skimming and carding, and would restore a seamless payment experience for card holders. Apple for one has grasped the nettle, and is using its Secure Element-based Apple Pay method (established now for card present NFC payments) for Card Not Present transactions, in the app.

    See also my 2012 paper Calling for a Uniform Approach to Card Fraud Offline and On" (PDF).

    Abstract

    The credit card payments system is a paragon of standardisation. No other industry has such a strong history of driving and adopting uniform technologies, infrastructure and business processes. No matter where you keep a bank account, you can use a globally branded credit card to go shopping in almost every corner of the world. The universal Four Party settlement model, and a long-standing card standard that works the same with ATMs and merchant terminals everywhere underpin seamless convenience. So with this determination to facilitate trustworthy and supremely convenient spending in every corner of the earth, it’s astonishing that the industry is still yet to standardise Internet payments. We settled on the EMV standard for in-store transactions, but online we use a wide range of confusing and largely ineffective security measures. As a result, Card Not Present (CNP) fraud is growing unchecked.

    This article argues that all card payments should be properly secured using standardised hardware. In particular, CNP transactions should use the very same EMV chip and cryptography as do card present payments.

    With all the innovation in payments leveraging cryptographic Secure Elements in mobile phones, perhaps at last we will see CNP payments modernised for web and mobile shopping.

    Posted in Payments

    Yet another anonymity promise broken

    In 2016, the Australian government released, for research purposes, an extract of public health insurance data, comprising the 30-year billing history of ten percent of the population, with medical providers and patients purportedly de-identified. Melbourne University researcher Dr Vanessa Teague and her colleagues famously found quite quickly that many of the providers were readily re-identified. The dataset was withdrawn, though not before many hundreds of copies were downloaded from the government website.

    The government’s responses to the re-identification work were emphatic but sadly not positive. For one thing, legislation was written to criminalize the re-identification of ostensibly ‘anonymised’ data, which would frustrate work such as Teague’s regardless of its probative value to ongoing privacy engineering (the bill has yet to be passed). For another, the Department of Health insisted that no patient information has been compromised.

    It seems less ironic than inevitable that in fact the patients’ anonymity was not to be taken as read. In follow-up work released today, Teague, with Dr Chris Culnane and Dr Ben Rubinstein, have now published a paper showing how patients in that data release may indeed be re-identified.

    The ability to re-identify patients from this sort of Open Data release is frankly catastrophic. The release of imperfectly de-identified healthcare data poses real dangers to patients with socially difficult conditions. This is surely well understood. What we now need to contend with is the question of whether Open Data practices like this deliver benefits that justify the privacy risks. That’s going to be a trick debate, for the belief in data science is bordering on religious.

    It beggars belief that any government official would promise "anonymity" any more. These promises just cannot be kept.

    Re-identification has become a professional sport. Researchers are constantly finding artful new ways to triangulate individuals’ identities, drawing on diverse public information, ranging from genealogical databases to social media photos. But it seems that no matter how many times privacy advocates warn against these dangers, the Open Data juggernaut just rolls on. Concerns are often dismissed as academic, or being trivial compared with the supposed fruits of research conducted on census data, Medicare records and the like.

    In "Health Data in an Open World (PDF)" Teague et al warn (not for the first time) that "there is a misconception that [protecting the privacy of individuals in these datasets] is either a solved problem, or an easy problem to solve” (p2). They go on to stress “there is no good solution for publishing sensitive unit-record level data that protects privacy without substantially degrading the usefulness of the data" (p3).

    What is the cost-benefit of the research done on these data releases? Statisticians and data scientists say their work informs government policy, but is that really true? Let’s face it. "Evidence based policy" has become quite a joke in Western democracies. There are umpteen really big public interest issues where science and evidence are not influencing policy settings at all. So I am afraid statisticians need to be more modest about the practical importance of their findings when they mount bland “balance” arguments that the benefits outweigh the risks to privacy.

    If there is a balance to be struck, then the standard way to make the calculation is a Privacy Impact Assessment (PIA). This can formally assess the risk of “de-identified” data being re-identified. And if it can be, a PIA can offer other, layered protections to protect privacy.

    So where are all the PIAs?

    Open Data is almost a religion. Where is the evidence that evidence-based policy making really works?

    I was a scientist and I remain a whole-hearted supporter of publicly funded research. But science must be done with honest appraisal of the risks. It is high time for government officials to revisit their pat assertions of privacy and security. If the public loses confidence in the health system's privacy protection, then some people with socially problematic conditions might simply withdraw from treatment, or hold back vital details when they engage with healthcare providers. In turn, that would clearly damage the purported value of the data being collected and shared.

    Big Data-driven research on massive public data sets just seems a little too easy to me. We need to discuss alternatives to massive public releases. One option is to confine research data extracts to secure virtual data rooms, and grant access only to specially authorised researchers. These people would be closely monitored and audited; they would comprise a small set of researchers; their access would be subject to legally enforceable terms & conditions.

    There are compromises we all need to make in research on human beings. Let’s be scientific about science-based policy. Let’s rigorously test our faith in Open Data, and let’s please stop taking “de-identification” for granted. It’s really something of a magic spell.

    Posted in Big Data, Government, Privacy

    The myth of the informed Internet user

    Yet another Facebook ‘People You May Know’ scandal broke recently when a sex worker found that the social network was linking her clients to her “real identity”. Kashmir Hill reported the episode for Gizmodo.

    This type of thing has happened before. In 2012, a bigamist was outed when his two wives were sent friend-suggestions. In 2016, Facebook introduced a psychiatrists’ patients to each other (Kash Hill again). I actually predicted that scenario back in 2010, in a letter to the British Medical Journal.

    Facebook’s self-serving philosophy that there should be no friction and no secrets online has created this slippery slope, where the most tenuous links between people are presumed by the company to give it license to join things up. But note carefully that exposing ‘People You May Know’ (PYMK) is the tip of the iceberg; the chilling thing is that Facebook’s Big Data algorithms will be making myriad connections behind the scenes, long before it gets around to making introductions. Facebook is dedicated to the covert refining of all the things it knows about us, in an undying effort to value-add its information assets.

    It’s been long understood that Facebook has no consent to make these linkages. I wrote about the problem in a chapter of the 2013 Encyclopedia of Social Network Analysis and Mining (recently updated): “The import of a user’s contacts and use for suggesting friends represent a secondary use of Personal Information of third parties who may not even be Facebook members themselves and are not given any notice much less the opportunity to expressly consent to the collection.” Relatedly, Facebook also goes too far when it makes photo tag suggestions, by running its biometric face recognition algorithms in the background, a practice outlawed by European privacy authorities.

    We can generalise this issue, from the simple mining of contact lists, to the much more subtle collection of synthetic personal data. If Facebook determines through its secret Big Data algorithms that a person X is somehow connected to member Y, then it breaches X’s privacy to “out” them. There can be enormous harm, as we’ve seen in the case of the sex worker, if someone’s secrets are needlessly exposed, especially without warning. Furthermore, note that the technical privacy breach is deeper and probably more widespread: under most privacy laws worldwide, merely making a new connection in a database synthesizes personal information about people, without cause and without consent. I’ve called this algorithmic collection and it runs counter to the Collection Limitation principle.

    This latest episode serves another purpose: it exposes the lie that people online are fully aware of what they’re getting themselves into.

    There’s a bargain at the heart of the social Internet, where digital companies provide fabulous and ostensibly free services in return for our personal information. When challenged about the fairness of this trade, the data barons typically claim that savvy netizens know there is no such thing as a free lunch, and are fully aware of how the data economy works.

    But that’s patently not the case. The data supply chain is utterly opaque. In Kash Hill’s article, she can’t figure out how Facebook has made the connection between a user’s carefully anonymous persona and her “real life” account (and Facebook isn’t willing to explain the “more than 100 signals that go into PYMK”). If this is a mystery to Hill, then it’s way beyond the comprehension of 99% of the population.

    The asymmetry in the digital economy is obvious, when the cleverest data scientists in the world are concentrated not in universities but in digital businesses (where they work on new ways to sell ads). Data is collected, synthesized, refined, traded and integrated, all behind our backs, in ever more complex, proprietary and invisible ways. If data is “the new crude oil”, then we’re surely approaching crunch time, when this vital yet explosive raw material needs better regulating.

    Posted in Facebook, Privacy

    Award winning blockchain paper at HIMSSAP17

    David Chou, CIO at Children’s Mercy Hospital Kansas City, and I wrote a paper “How Healthy is Blockchain Technology?” for the HIMSS Asia Pacific 17 conference in Singapore last week. The paper is a critical analysis of the strategic potential for current blockchains in healthcare applications, with a pretty clear conclusion that the technology is largely misunderstood, and on close inspection, not yet a good fit for e-health.

    And we were awarded Best Paper at the conference!

    The paper will be available soon from the conference website. The abstract and conclusions are below, and if you’d like a copy of the full paper in the meantime, please reach out to me at Steve@ConstellationR.com.

    Abstract

    Blockchain captured the imagination with a basket of compelling and topical security promises. Many of its properties – decentralization, security and the oft-claimed “trust” – are highly prized in healthcare, and as a result, interest in this technology is building in the sector. But on close inspection, first generation blockchain technology is not a solid fit for e-health. Born out of the anti-establishment cryptocurrency movement, public blockchains remove ‘people’ and ‘process’ from certain types of transactions, but their properties degrade or become questionable in regulated settings where people and process are realities. Having inspired a new wave of innovation, blockchain technology needs significant work before it addresses the broad needs of the health sector. This paper recaps what blockchain was for, what it does, and how it is evolving to suit non-payments use cases. We critically review a number of recent blockchain healthcare proposals, selected by a US Department of Health and Human Services innovation competition, and dissect the problems they are trying to solve.

    Discussion

    When considering whether first generation blockchain algorithms have a place in e-health, we should bear in mind what they were designed for and why. Bitcoin and Ethereum are intrinsically political and libertarian; their outright rejection of central authority is a luxury only possible in the rarefied world of cryptocurrency but is simply not rational in real world healthcare, where accountability, credentialing and oversight are essentials.

    Despite its ability to transact and protect pure “math-based money”, it is a mistake to think public blockchains create trust, much less that they might disrupt existing trust relationships and authority structures in healthcare. Blockchain was designed on an assumption that participants in a digital currency would not trust each other, nor want to know anything about each other (except for a wallet address). On its own, blockchain does not support any other real world data management.

    The newer Synchronous Ledger Technologies – including R3 Corda, Microsoft’s Blockchain as a Service, Hyperledger Fabric and IBM’s High Security Blockchain Network – are driven by deep analysis of the strengths and weaknesses of blockchain, and then re-engineering architectures to deliver similar benefits in use cases more complex and more nuanced than lawless e-cash. The newer applications involve orchestration of data streams being contributed by multiple parties (often in “coopetition”) with no one leader or umpire. Like the original blockchain, these ledgers are much more than storage media; their main benefit is that they create agreement about certain states of the data. In healthcare, this consensus might be around the order of events in a clinical trial, the consent granted by patients to various data users, or the legitimacy of serial numbers in the pharmaceuticals supply chain.

    Conclusion

    We hope healthcare architects, strategic planners and CISOs will carefully evaluate how blockchain technologies across what is now a spectrum of solutions apply in their organizations, and understand the work entailed to bring solutions into production.
    Blockchain is no silver bullet for the challenges in e-health. We find that current blockchain solutions will not dramatically change the way patient information is stored, because most people agree that personal information does not belong on blockchains. And it won’t dispel the semantic interoperability problems of e-health systems; these are outside the scope of what blockchain was designed to do.

    However newer blockchain-inspired Synchronous Ledger Technologies show great potential to address nuanced security requirements in complex networks of cooperating/competing actors. The excitement around the first blockchain has been inspirational, and is giving way to earnest sector-specific R&D with benefits yet to come.

    Posted in Security, Privacy, Innovation, e-health, Blockchain