Mobile: +61 (0) 414 488 851
Email: swilson@lockstep.com.au

Correspondence in Nature magazine

I had a letter to the editor published in Nature on big data and privacy.

Data protection: Big data held to privacy laws, too

Stephen Wilson
Nature 519, 414 (26 March 2015) doi:10.1038/519414a
Published online 25 March 2015

Letter as published

Privacy issues around data protection often inspire over-engineered responses from scientists and technologists. Yet constraints on the use of personal data mean that privacy is less about what is done with information than what is not done with it. Technology such as new algorithms may therefore be unnecessary (see S. Aftergood, Nature 517, 435–436; 2015).

Technology-neutral data-protection laws afford rights to individuals with respect to all data about them, regardless of the data source. More than 100 nations now have such data-privacy laws, typically requiring organizations to collect personal data only for an express purpose and not to re-use those data for unrelated purposes.

If businesses come to know your habits, your purchase intentions and even your state of health through big data, then they have the same privacy responsibilities as if they had gathered that information directly by questionnaire. This is what the public expects of big-data algorithms that are intended to supersede cumbersome and incomplete survey methods. Algorithmic wizardry is not a way to evade conventional privacy laws.

Stephen Wilson
Constellation Research, Sydney, Australia.

Posted in Science, Privacy, Big Data

On pure maths and innovation

An unpublished letter to the editor of The New Yorker, February 2015.

My letter

Alec Wilkinson says in his absorbing profile of the quiet genius Yitang Zhang ("The pursuit of beauty", February 2) that pure mathematics is done "with no practical purposes in mind". I do hope mathematicians will forever be guided by aesthetics more than economics, but nevertheless, pure maths has become a cornerstone of the Information Age, just as physics was of the Industrial Revolution. For centuries, prime numbers might have been intellectual curios but in the 1970s they were beaten into modern cryptography. The security codes that scaffold almost all e-commerce are built from primes. Any advances in understanding these abstract materials impacts the Internet itself, for better or for worse. So when Zhang demurs that his result is "useless for industry", he's mispeaking.

The online version of the article is subtitled "Solving an Unsolvable Problem". The apparent oxymoron belies a wondrous pattern we see in mathematical discovery. Conundrums widely accepted to be impossible are in fact solved quite often, and then frenetic periods of innovation usually follow. The surprise breakthrough is typically inefficient (or, far worse in a mathematician's mind, ugly) but it can inspire fresh thinking and lead to polished methods. We are in one of these intense creative periods right now. Until 2008, it was widely thought that true electronic cash was impossible, but then the mystery figure Satoshi Nakamoto created Bitcoin. While it overturned the conventional wisdom, Bitcoin is slow and anarchic, and problematic as mainstream money. But it has triggered a remarkable explosion of digital currency innovation.

A published letter

Another letter writer made a similar point:

As Alec Wilkinson points out in his Profile of the math genius Yitang Zhang, results in pure mathematics can be sources of wonder and delight, regardless of their applications. Yet applications do crop up. Nineteenth-century mathematicians showed that there are geometries as logical and complete as Euclidean geometry, but which are utterly distinct from it. This seemed of no practical use at the time, but Albert Einstein used non-Euclidean geometry to make the most successful model that we have of the behavior of the universe on large scales of distance and time. Abstract results in number theory, Zhang’s field, underlie cryptography used to protect communication on devices that many of us use every day. Abstract mathematics, beautiful in itself, continually results in helpful applications, and that’s pretty wonderful and delightful, too.

David Lee
Sandy Spring, Md.

On innovation

My favorite example of mathematical innovation concerns public key cryptography (and I ignore here the credible reports that PKC was invented by the Brits decades before but kept secret). For centuries, there was essentially one family of cryptographic algorithms, in which a secret key shared by sender and recipient is used to both encrypt and decrypt the protected communication. Key distribution is the central problem in so-called "Symmetric" Cryptography: how does the sender get the secret key to the recipient some time before sending the message? The dream was for the two parties to be able to establish a secret key without ever having to meet or using any secret channel. It was thought to be an unsolvable problem ... until it was solved by Ralph Merkle in 1974. His solution, dubbed "Merkle's Puzzles" was almost hypothetical; the details don't matter here but they were going to be awkward to put it mildly, involving millions of small messages. But the impact on cryptography was near instantaneous. The fact that, in theory, two parties really could establish a shared secret via public messages triggered a burst of development of practical public key cryptography, first of the Diffie-Hellman algorithm, and then RSA by Ron Rivest, Adi Shamir and Leonard Adleman. We probably wouldn't have e-commerce if it wasn't for Merkle's crazy curious maths.

Posted in Security, Science

Facebook's lab rats

It's long been said that if you're getting something for free online, then you're not the customer, you're the product. It's a reference to the one-sided bargain for personal information that powers so many social businesses - the way that "infomopolies" as I call them exploit the knowledge they accumulate about us.

Now it's been revealed that we're even lower than product: we're lab rats.

Facebook data scientist Adam Kramer, with collaborators from UCSF and Cornell, this week reported on a study in which they tested how Facebook users respond psychologically to alternatively positive and negative posts. Their experimental technique is at once ingenious and shocking. They took the real life posts of nearly 700,000 Facebook members, and manipulated them, turning them slightly up- or down-beat. And then Kramer at al measured the emotional tone in how people reading those posts reacted in their own feeds. See Experimental evidence of massive-scale emotional contagion through social networks, Adam Kramer,Jamie Guillory & Jeffrey Hancock, in Proceedings of the National Academy of Sciences, v111.24, 17 June 2014.

The resulting scandal has been well-reported by many, including Kashmir Hill in Forbes, whose blog post nicely covers how the affair has unfolded, and includes a response by Adam Kramer himself.

Plenty has been written already about the dodgy (or non-existent) ethics approval, and the entirely contemptible claim that users gave "informed consent" to have their data "used" for research in this way. I draw attention to the fact that consent forms in properly constituted human research experiments are famously thick. They go to great pains to explain what's going on, the possible side effects and potential adverse consequences. The aim of a consent form is to leave the experimental subject in no doubt whatsoever as to what they're signing up for. Contrast this with the Facebook Experiment where they claim informed consent was represented by a fragment of one sentence buried in thousands of words of the data usage agreement. And Kash Hill even proved that the agreement was modified after the experiment started! These are not the actions of researchers with any genuine interest in informed consent.

I was also struck by Adam Kramer's unvarnished description of their motives. His response to the furore (provided by Hill in her blog) is, as she puts it, tone deaf. Kramer makes no attempt whatsoever at a serious scientific justification for this experiment:

  • "The reason we did this research is because we care about the emotional impact of Facebook and the people that use our product ... [We] were concerned that exposure to friends’ negativity might lead people to avoid visiting Facebook.

That is, this large scale psychological experiment was simply for product development.

Some apologists for Facebook countered that social network feeds are manipulated all the time, notably by advertisers, to produce emotional responses.

Now that's interesting, because for their A-B experiment, Kramer and his colleagues took great pains to make sure the subjects were unaware of the manipulation. After all, the results would be meaningless if people knew what they were reading had been emotionally fiddled with.

In contrast, the ad industry has always insisted that today's digital consumers are super savvy, and they know the difference between advertising and real-life. Yet the foundation of the Facebook experiment is that users are unaware of how their online experience is being manipulated. The ad industry's illogical propaganda [advertising is just harmless fun, consumers can spot the ads, they're not really affected by ads all that much ... Hey, with a minute] has only been further exposed by the Facebook Experiment.

Advertising companies and Social Networks are increasingly expert at covertly manipulating perceptions, and now they have the data, collected dishonestly, to prove it.

Posted in Social Networking, Social Media, Science, Privacy, Internet, Culture

Watson the Doctor is no laughing matter

For the past year, oncologists at the Memorial Sloan Kettering Cancer Centre in New York have been training IBM’s Watson – the artificial intelligence tour-de-force that beat allcomers on Jeopardy – to help personalise cancer care. The Centre explains that "combining [their] expertise with the analytical speed of IBM Watson, the tool has the potential to transform how doctors provide individualized cancer treatment plans and to help improve patient outcomes". Others are speculating already that Watson could "soon be the best doctor in the world".

I have no doubt that when Watson and things like it are available online to doctors worldwide, we will see overall improvements in healthcare outcomes, especially in parts of the world now under-serviced by medical specialists [having said that, the value of diagnosing cancer in poor developing nations is questionable if they cannot go on to treat it]. As with Google's self-driving car, we will probably get significant gains eventually, averaged across the population, from replacing humans with machines. Yet some of the foibles of computing are not well known and I think they will lead to surprises.

For all the wondrous gains made in Artificial Intelligence, where Watson now is the state-of-the art, A.I. remains algorithmic, and for that, it has inherent limitations that don't get enough attention. Computer scientists and mathematicians have know for generations that some surprisingly straightforward problems have no algorithmic solution. That is, some tasks cannot be accomplished by any universal step-by-step codified procedure. Examples include the Halting Problem and the Travelling Salesperson Problem. If these simple challenges have no algorithm, we need be more sober in our expectations of computerised intelligence.

A key limitation of any programmed algorithm is that it must make its decisions using a fixed set of inputs that are known and fully characterised (by the programmer) at design time. If you spring an unexpected input on any computer, it can fail, and yet that's what life is all about -- surprises. No mathematician seriously claims that what humans do is somehow magic; most believe we are computers made of meat. Nevertheless, when paradoxes like the Halting Problem abound, we can be sure that computing and cognition are not what they seem. We should hope these conundrums are better understood before putting too much faith in computers doing deep human work.

And yet, predictably, futurists are jumping ahead to imagine "Watson apps" in which patients access the supercomputer for themselves. Even if there were reliable algorithms for doctoring, I reckon the "Watson app" is a giant step, because of the complex way the patient's conditions are assessed and data is gathered for the diagnosis. That is, the taking of the medical history.

In these days of billion dollar investments in electronic health records (EHRs), we tend to think that medical decisions are all about the data. When politicians announce EHR programs they often boast that patients won't have to go through the rigmarole of giving their history over and over again to multiple doctors as they move through an episode of care. This is actually a serious misunderstanding of the importance in clinical decision-making of the interaction between medico and patient when the history is taken. It's subtle. The things a patient chooses to tell, the things they seem to be hiding, and the questions that make them anxious, all guide an experienced medico when taking a history, and provide extra cues (metadata if you will) about the patient’s condition.

Now, Watson may well have the ability to navigate this complexity and conduct a very sophisticated Q&A. It will certainly have a vastly bigger and more reliable memory of cases than any doctor, and with that it can steer a dynamic patient questionnaire. But will Watson be good enough to be made available direct to patients through an app, with no expert human mediation? Or will a host of new input errors result from patients typing their answers into a smart phone or speaking into a microphone, without any face-to-face subtlety (let alone human warmth)? It was true of mainframes and it’s just as true of the best A.I.: Bulldust in, bulldust out.

Finally, Watson's existing linguistic limitations are not to be underestimated. It is surely not trivial that Watson struggles with puns and humour. Futurist Mark Pesce when discussing Watson remarked in passing that scientists don’t understand the "quirks of language and intelligence" that create humour. The question of what makes us laugh does in fact occupy some of the finest minds in cognitive and social science. So we are a long way from being able to mechanise humour. And this matters because for the foreseeable future, it puts a great deal of social intercourse beyond AI's reach.

In between the extremes of laugh-out-loud comedy and a doctor’s dry written notes lies a spectrum of expressive subtleties, like a blush, an uncomfortable laugh, shame, and the humiliation that goes with some patients’ lived experience of illness. Watson may understand the English language, but does it understand people?

Watson can answer questions, but good doctors ask a lot of questions too. When will this amazing computer be able to hold the sort of two-way conversation that we would call a decent "bedside manner"?

Have a disruptive technology implementation story? Get recognised for your leadership. Apply for the 2014 SuperNova Awards for leaders in disruptive technology.

Posted in Software engineering, Science, Language, e-health, Culture, Big Data

The beginning of privacy!

The headlines proclaim that the newfound ability to re-identify anonymous DNA donors means The End Of Privacy!.

No it doesn't, it only means the end of anonymity.

Anonymity is not the same thing as privacy. Anonymity keeps people from knowing what you're doing, and it's a vitally important quality in many settings. But in general we usually want people (at least some people) to know what we're up to, so long as they respect that knowledge. That's what privacy is all about. Anonymity is a terribly blunt instrument for protecting privacy, and it's also fragile. If anonymity was all you have, then you're in deep trouble when someone manages to defeat it.

New information technologies have clearly made anonymity more difficult, yet it does not follow that we must lose our privacy. Instead, these developments bring into stark relief the need for stronger regulatory controls that compel restraint in the way third parties deal with Personal Information that comes into their possession.

A great example is Facebook's use of facial recognition. When Facebook members innocently tag one another in photos, Facebook creates biometric templates with which it then automatically processes all photo data (previously anonymous), looking for matches. This is how they can create tag suggestions, but Facebook is notoriously silent on what other applications it has for facial recognition. Now and then we get a hint, with, for example, news of the Facedeals start up last year. Facedeals accesses Facebook's templates (under conditions that remain unclear) and uses them to spot customers as they enter a store to automatically check them in. It's classic social technology: kinda sexy, kinda creepy, but clearly in breach of Collection, Use and Disclosure privacy principles.

And indeed, European regulators have found that Facebook's facial recognition program is unlawful. The chief problem is that Facebook never properly disclosed to members what goes on when they tag one another, and they never sought consent to create biometric templates with which to subsequently identify people throughout their vast image stockpiles. Facebook has been forced to shut down their facial recognition operations in Europe, and they've destroyed their historical biometric data.

So privacy regulators in many parts of the world have real teeth. They have proven that re-identification of anonymous data by facial recognition is unlawful, and they have managed to stop a very big and powerful company from doing it.

This is how we should look at the implications of the DNA 'hacking'. Indeed, Melissa Gymrek from the Whitehead Institute said in an interview: "I think we really need to learn to deal with the fact that we cannot ever make data sets truly anonymous, and that I think the key will be in regulating how we are allowed to use this genetic data to prevent it from being used maliciously."

Perhaps this episode will bring even more attention to the problem in the USA, and further embolden regulators to enact broader privacy protections there. Perhaps the very extremeness of the DNA hacking does not spell the end of privacy so much as its beginning.

Posted in Social Media, Science, Privacy, Biometrics, Big Data

Letter to Science: Re-identification of DNA may need ethics approval


I had a letter published in Science magazine about the recently publicised re-identification of anonymously donated DNA data. It has been shown that there is enough named genetic information online, in genealogical databases for instance, that anonymous DNA posted in research databases can be re-identified. This is a sobering result indeed. But does it mean that 'privacy is dead'?

No. The fact is that re-identification of erstwhile anonymous data represents an act of collection of PII and is subject to the Collection Limitation Principle in privacy law around the world. This is essentially the same scenario as Facebook using biometric facial recognition to identify people in photos. European regulators recently found Facebook to have breached privacy law and have forced Facebook to shut down their facial recognition feature.

I expect that the very same legal powers will permit regulators to sanction the re-identification of DNA. There are legal constraints on what can be done with 'anonymous' data no matter where you get it from: under some data privacy laws, attaching names to such data constitutes a Collection of PII, and as such, is subject to consent rules and all sorts of other principles. As a result, bioinformatics researchers will have to tread carefully, justifying their ends and their means before ethics committees. And corporations who seek to exploit the ability to put names on anonymous genetic data may face the force of the law as Facebook did.

To summarise: Let's assume Subject S donates their DNA, ostensibly anonymously, to a Researcher R1, under some consent arrangement which concedes there is a possibility that S will be re-identified. And indeed, some time later, an independent researcher R2 does identify S as belonging to the DNA sample. The fact that many commentators seem oblivious to is this: R2 has Collected Personal Information (or PII) about S.* If R2 has no relationship with S, then S has not consented to this new collection of her PII. In jurisdictions with strict Collection Limitation (like the EU, Australia and elsewhere) then it seems to me to be a legal privacy breach for R2 to collect PII by way of DNA re-identification without express consent, regardless of whether R1 has conceded to S that it might happen. Even in the US, where the protections might not be so strict, there remains a question of ethics: should R2 conduct themselves in a manner that might be unlawful in other places?

* Footnote: The collection by R2 of PII about S in this case is indirect, but nevertheless is a fresh collection. Logically, if anonymous data is converted into identifiable data, then PII has been collected. In most cases, the Collection Limitation principle in privacy is technology neutral. Privacy laws do not generally care how PII is collected; if PII is created by some process (such as re-identification) then privacy principles still apply. And this is what consumers would expect. If a Big Data process allows an organisation to work out insights about people without having to ask them explicit questions (which is precisely why Big Data is so valuable to business and governments), then the individuals concerned should expect that privacy principles still apply.

The text of my letter to Science follows, and after that, I'll keep posting follow ups.

Legal Limits to Data Re-Identification

Science 8 February 2013:
Vol. 339 no. 6120 pp. 647

Yaniv Erlich at the Whitehead Institute for Biomedical Research used his hacking skills to decipher the names of anonymous DNA donors ("Genealogy databases enable naming of anonymous DNA donor," J. Bohannon, 18 January, p. 262). A little-known legal technicality in international data privacy laws could curb the privacy threats of reverse identification from genomes. "Personal information" is usually defined as any data relating to an individual whose identity is readily apparent from the data. The OECD Privacy Principles are enacted in over 80 countries worldwide [1]. Privacy Principle No. 1 states: "There should be limits to the collection of personal data and any such data should be obtained by lawful and fair means and, where appropriate, with the knowledge or consent of the data subject." The principle is neutral regarding the manner of collection. Personal information may be collected directly from an individual or indirectly from third parties, or it may be synthesized from other sources, as with "data mining."

Computer scientists and engineers often don't know that recording a person's name against erstwhile anonymous data is technically an act of collection. Even if the consent form signed at the time of the original collection includes a disclaimer that absolute anonymity cannot be guaranteed, re-identifying the information later signifies a new collection. The new collection of personal information requires its own consent; the original disclaimer does not apply when third parties take data and process it beyond the original purpose for collection. Educating those with this capability about the legal meaning of collection should restrain the misuse of DNA data, at least in those jurisdictions that strive to enforce the OECD principles.

It also implies that bioinformaticians working "with little more than the Internet" to attach names to samples may need ethics approval, just as they would if they were taking fresh samples from the people concerned.

Stephen Wilson
Lockstep Consulting Pty Ltd
Five Dock Sydney, NSW 2046, Australia.
E-mail: swilson@lockstep.com.au

[1] Graham Greenleaf Global data privacy laws: 89 countries, and accelerating Privacy Laws & Business International Report, Issue 115, Special Supplement, February 2012.


In an interview with Science Magazine on Jan 18, the Whitehead Institute's Melissa Gymrek discussed the re-identification methods, and the potential to protect against them. She concluded: "I think we really need to learn to deal with the fact that we cannot ever make data sets truly anonymous, and that I think the key will be in regulating how we are allowed to use this genetic data to prevent it from being used maliciously.".

I agree completely. We need regulations. Elsewhere I've argued that anonymity is an inadequate way to protect privacy, and that we need a balance of regulations and Privacy Enhancing Technologies. And it's for this reason that I am not fatalistic about the fact that anonymity can be broken, because we have the procedural means to see that privacy is still preserved.

Posted in Science, Privacy, Big Data

Memetic engineering our identities

This blog post builds a little further on my ecological ideas about the state of digital identity, first presented at the AusCERT 2011 conference. I have submitted a fresh RSAC 2013 speaker proposal where I hope to show a much more fully developed memetic model.

The past twenty years has seen a great variety of identity methods and devices emerge in the digital marketplace. In parallel, Internet business in many sectors has developed under the existing metasystems of laws, sectoral regulations, commercial contracts, industry codes, and traditional risk management arrangements.

Variety of identities 12824

As with Darwin's finches, the very variety of identity methods suggests an ecological explanation. It seems most likely that different methods have evolved in response to different environmental pressures.

The orthodox view today is that we are given a plurality of identities from the many organisations we do business with. Our bank account is thought to be an discrete identity, as is our employment, our studentship, our membership of a professional body, and our belonging to a social network. Identity federation seeks to take an identity out of its original context, and present it in another, so that we can strike up new relationships without having to repeat the enrolment processes. But in practice, established identities are brittle; they don't bend easily to new uses unanticipated by their original issuers. Even superficially similar identities are not readily relied upon, because of the contractual fine print. Famously in Australia, one cannot open a fresh bank account on the basis of having an existing accout at another bank, even though their identification protocols are essentially identical, under the law. Similarly, government agencies have historically struggled to cross-recognise each other's security clearances.

I have come to the conclusion that we have abstracted "identity" at too high a level. We need to drop down a level or two and make smarter use of how identities are constructed. It shouldn't be hard to do; we have a lot of the conceptual apparatus already. In particular, one of the better definitions of digital identity holds that it is a set of assertions or claims [Ref:The Laws of Identity]. Instead of federating rolled-up high level identities, we would have an easier time federating selected assertions.

Now, generalising beyond the claims and assertions, consider that each digital identity is built from a broad ensemble of discrete technological and procedural traits, spanning such matters as security techniques, registration processes, activation processes, identity proofing requirements (which are regulated in some industries like banking and the healthcare professions), user interface, algorithms, key lengths, liability arrangements, and so on. These traits together with the overt identity assertions -- like date of birth, home address and social security number -- can be seen as memes: heritable units of business and technological "culture".

IEEE Part B Diagram (2 0)

The ecological frame leads us to ask: where did these traits come from? What forces acted upon the constituent identity memes to create the forms we see today? Well, we can see that different selection pressures operate in different business environments, and that memes evolve over time in response. Example of selection pressures include fraud, privacy (with distinct pressures to both strengthen and weaken privacy playing out before our eyes), convenience, accessibility, regulations (like Basel II, banking KYC rules, medical credentialing rules, and HSPD-12), professional standards, and new business models like branch-less banking and associated Electronic Verification of Identity. Each of these factors shift over time, usually moving in and out of equilibrium with other forces, and the memes shift too. Successful memes -- where success means that some characteristic like key length or number of authentication factors has proven effective in reducing risk -- are passed on to successive generations of identity solution. The result is that at any time, the ensemble of traits that make up an "identity" in a certain context represents the most efficient way to manage misidentification risks.

The "memome" of any given rolled-up identity -- like a banking relationship for instance -- is built from all sorts of ways doing things, as illustrated. We can have different ways of registering new banking customers, checking their bona fides, storing their IDs, and activating their authenticators. Over time, these component memes develop in different ways, usually gradually, as the business environment changes, but sometimes in sudden step-changes when the environment is occasionally disrupted by e.g. a presidential security directive, or a w business model like branch-less banking. And as with real genomes, identity memes interact, changing how they are expressed, even switching each other on and off.

As they say, things are the way they are because they got that way.

I reckon before we try to make identities work across contexts they were not originally intended for, we need to first understand the evolutionary back stories for each of the identity memes, and the forces that shaped them to fit certain niches in business ecosystems. Then we may be able to literally do memetic engineering to adapt a core set of relationships to new business settings.

The next step is to rigorously document some of the back stories, and to see if the "phylomemetics" really hangs together.

Posted in Science, Identity, Federated Identity

Is quantum computing all it's cracked up to be?

Quantum computing continues to make strides. Now they've made a chip to execute Shor's quantum foctorisation algorithm. Until now, quantum computers were built from bench-loads of apparatus, and had yet to be fabricated in solid state. So this is pretty cool, taking QC from science into engineering.

The promise of quantum computing is that it will eventually render today's core cryptography obsolete, by making it possible to factorise large numbers very quickly. The RSA algorithm for now is effectively unbreakable because its keys are the products of prime numbers hundreds of digits long. The product of two primes can be computed in split seconds; but to find the factors by brute force - and thus crack the code - takes billions of computer-years.

I'm curious about one thing. Current prototype quantum computers are built with just a few qubits because of the "coherence" problem (so they can only factorise little numbers like 15 = 3 x 5). The machinery has to hold all the qubits in a delicate state of quantum uncertainty for long enough to complete the computation. The more qubits they have, the harder it is to maintain coherence. The task ahead is to scale up past the proof-of-concept stage to a manage quite a lot of qubits and thus be able to crack 4096-bit RSA keys for instance.

Evidently it's very hard to build say a 100 qubit quantum computer right now. So my question is: What is the relationship between the difficulty of maintaining coherence and the number of qubits concerned? Is it exponentially difficult? Or worse?

Because if it is, then the way to stay ahead of quantum computing attack might be to simply go out to RSA keys tens of thousands of digits long.

If engineering quantum computers is inherently difficult (as is miniaturizing transistors), then no matter how good we get at it, arbitrarily long qubit computations will remain costly, and the arms race between cryptographic key length and brute force attack may continue indefinitely. The end of conventional encryption is not yet in sight.

Posted in Security, Science

Science is more than the books it produces

These days it's common to hear the modest disclaimer that there are some questions science can’t answer. I most recently came across such a show of humility by Dr John Kirk speaking on ABC Radio National's Ockham's Razor [1]. Kirk says that "science cannot adjudicate between theism and atheism" and insists that science cannot bridge the divide between physics and metaphysics. Yet surely the long history of science shows that divide is fluid.

Science is not merely about the particular answers; it's about the steady campaign on all that is knowable. Science demystifies.

Textbook examples are legion where new sciences have rendered previously fearsome phenomena as firstly explicable and then often manageable: astronomy, physiology, meteorology, sedimentology, seismology, microbiology, psychology and neurology, to name a few.

It's sometimes said that questions matter more in science than the answers. Good scientists ask good questions, but great ones show where there is no question anymore.

Once something profound is no longer beyond understanding, that awareness permeates society. Each wave of scientific advance usually becomes manifest by new technologies, but more important to the human condition is that science gives us confidence. In an enlightened society, those with no scientific training at all still appreciate that science gets how the world works. Over time this tacit rational confidence has energised modernity, supplanting astrologers, shamans, witch doctors, and even the churches. Laypeople may not know how televisions work, nor nuclear medicine, semiconductors, anaesthetics, antibiotics or fibre optics, but they sure know it's not by magic.

The arc of science parts mystery's curtain. Contrary to John Kirk's partitions, science frequently renders the metaphysical as natural and empirically knowable. My favorite example is the work of Galileo. To the pre-Copernican mind, the Sun was perfect and ethereal, but when Galileo trained his new telescope upon it, he saw spots. The imperfections of sunspots were shocking enough, but a real paradigm shift came when Galileo observed the sunspots moving across the face, disappearing off one side, and then returning hours later on the other. Galileo's epiphany must have been heart-stopping: he saw that the Sun is a sphere turning on its axis: geometric, humble, altogether of this world, and very reasonably the centre of a solar system as Copernicus had reasoned a few decades earlier.

An even more dramatic turn was Darwin's discovery that all the world's living complexity was explicable without god. He not only neutralised the Argument from Design for the existence of god, but he also dispelled teleology, the search for ultimate reason. The deepest lesson of Darwinism is that there is simply no need to ask "What am I doing here?" because the wondrous complexity of all of biology, up to and including humanity, are seen to have arisen through natural selection, without a designer and without a reason. It seems philosophers appreciate the deep lessons of Darwinism more than our modest scientists: Karl Marx saw that evolution "deals the death-blow to teleology"; Frederich Nietzsche proclaimed "God is dead ... we have killed him".

So why shouldn't we expect science to keep penetrating metaphysics? We should we doubt ― or perhaps fear ― its power to remove all mystery? Of course many remaining riddles are very hard indeed, and I know there's no guarantee we will solve them. But I don't see any logic in flatly rejecting the possibility. Some physicists feel they're homing in why the physical constants should have their special values. And many cognitive scientists and philosophers of the mind suspect a theory of consciousness is within reach. I'm not saying anyone yet really gets consciousness yet, but surely most would agree that it just doesn't feel a total enigma anymore.

Science is more than the books it produces. It's the power to keep writing new ones.


[1]. "Why is science such a worry?" Ockham's Razor 18 December 2011 http://www.abc.net.au/radionational/programs/ockhamsrazor/ockham27s-razor-18-december-2011/3725968

Posted in Science, Culture

An improved frame for understanding Digital Identity

I’ve always been uneasy about the term “ecosystem” being coopted when what is really meant is “marketplace”. There’s nothing wrong with marketplaces! They are where we technologists study problems and customer needs, launch our solutions, and jockey for share. The suddenly-orthodox “Identity Ecosystem” as expressed in the NSTIC is an elaborate IT architecture, that defines specific roles for users, identity providers, relying parties and other players. And the proposed system is not yet complete; many commentators have anticipated that NSTIC may necessitate new legislation to allocate liability.

So it’s really not an "ecosystem”. True ecosystems evolve; they are not designed. If NSTIC isn’t even complete, then badging it an “ecosystem” seems more marketing than ecology; it’s an effort to raise NSTIC as a policy initiative above the hurly burly of competitive IT.

My unease about “ecosystem” got me thinking about ecology, and whether genuine ecological thinking could be useful to analyse the digital identity environment. I believe there is potential here for a new and more powerful way to frame the identity problem.

We are surrounded by mature business ecosystems, in which different sectors and communities-of-interest, large and small, have established their own specific arrangements for managing risk. A special part of risk management is the way in which customers, users, members or business partners are identified. There is almost always a formal protocol by which an individual joins one of these communities-of-interest; that is, when they join a company, qualify for a professional qualification, or open a credit account. Some of these registration protocols are set freely by employers, merchants, associations and the like; others have a legislated element, in regulated industries like aviation, healthcare and finance; and in some cases registration is almost trivial, as when you sign up to a blog site. The conventions, rules, professional charters, contracts, laws and regulations that govern how people do business in different contexts are types of memes. They have evolved in different contexts to minimise risk, and have literally been passed on from one generation to another.

As business environments change, risk management rules in response change too. And so registration processes are subject to natural selection. An ecological treatment of identity recognises that “selection pressures” act on those rules. For instance, to deal with increasing money laundering and terrorist financing, many prudential regulators have tightened the requirements for account opening. To deal with ID theft and account takeover, banks have augmented their account numbers with Two Factor Authentication. The US government’s PIV-I rules for employees and contractors were a response to Homeland Security Presidential Directive HSPD-12. Cell phone operators and airlines likewise now require extra proof of ID. Medical malpractice in various places has led hospitals to tighten their background checks on new staff.

It's natural and valuable, up to a point, to describe "identities" being provided to people acting in these different contexts. This abstraction is central to the Laws of Identity. Unfortunately the word identity is suggestive of a sort of magic property that can be taken out of one context and used in another. So despite the careful framing of the Laws of Identity, it seems that people still carry around a somewhat utopian idea of digital identity, and a tacit belief in the possibility of a universal digital passport (or at least a greatly reduced number of personal IDs). I have argued elsewhere that the passport is actually implausible. If I am right about that, then what is sorely needed next is a better frame for understanding digital identity, a perspective that helps people stay away of the dangerous temptation of a single passport, and to understand that a plurality of identities is the natural state of being.

All modern identity thinking recognises that digital identities are context dependent. Yet it seems that federated identity projects have repeatedly underestimated the strength of that dependence. The federated identity movement is based on an optimism that we can change context for the IDs we have now, and still preserve some recognisable and reusable core identity. Or alternatively, create a new smaller set of IDs that will be useful for transacting with a superset of services. Such “interoperability” has only been demonstrated to date in near-trivial use cases like logging onto blog sites with unverified OpenIDs, or Facebook or Twitter handles. More sophisticated re-use of identities across context – such as the Australian banking sector’s ill-fated Trust Centre project – have foundered, even when there is pre-existing common ground in identification protocols.

The greatest challenge in federated identity is getting service and identity providers that are accustomed to operating in their own silos, to accept risks arising from identification and/or authentication performed on/by their members in other silos. This is where the term “identity” can be distracting. It is important to remember that the “identities” issued by banks, government agencies, universities, cell phone companies, merchants, social networks and blog sites are really proxies for the arrangements to which members have signed up.

This is why identity is so very context dependent, and so why some identities are so tricky to federate.

If we think ecologically, then a better word for the “context” that an identity operates in may be niche. This term properly evokes the tight evolved fit between an identity and the setting in which it is meaningful. In most cases, if we want to understand the natural history of identities, we can look to the existing business ecosystem from where they came. The environmental conditions that shaped the particular identities issued by banks, credit card companies, employers, governments, professional bodies etc. are not fundamentally changed by the Internet. As such, we should expect that when these identities transition from real world to digital, their properties – especially their “interoperability” and liability arrangements -- cannot change a great deal. It is only the pure cyber identities like blogger names, OSN handles and gaming avatars that are highly malleable, because their environmental niches are not so specific.

As an aside, noting how they spread far and wide, and too quickly for us to predict the impacts, maybe it's accurate to liken OpenID and Facebook Connect to weeds!

A lot more work needs to be done on this ecological frame, but I’m thinking we need a new name, distinct from "federated" identity. It seems to me that most business, whether it be online or off, turns on Evolved Identities. If we appreciate the natural history of identities as having evolved, then we should have more success with digital identity. It will become clearer which identities can federate easily, and which cannot, because of their roots in real world business ecosystems.

Taking a digital identity (like a cell phone account) out of its natural niche and hoping it will interoperate in another niche (like banking) can be compared to taking a tropical salt water fish and dropping it into a fresh water tank. If NSTIC is an ecosystem, it is artificial. As such it may be as fragile as an exotic botanic garden or tropical aquarium. I fear that full blown federated identity systems are going to need constant care and intervention to save them from collapse.

Posted in Science, Language, Identity, Federated Identity, Culture, Security