It's long been said that if you're getting something for free online, then you're not the customer, you're the product. It's a reference to the one-sided bargain for personal information that powers so many social businesses - the way that "infomopolies" as I call them exploit the knowledge they accumulate about us.
Now it's been revealed that we're even lower than product: we're lab rats.
Facebook data scientist Adam Kramer, with collaborators from UCSF and Cornell, this week reported on a study in which they tested how Facebook users respond psychologically to alternatively positive and negative posts. Their experimental technique is at once ingenious and shocking. They took the real life posts of nearly 700,000 Facebook members, and manipulated them, turning them slightly up- or down-beat. And then Kramer at al measured the emotional tone in how people reading those posts reacted in their own feeds. See Experimental evidence of massive-scale emotional contagion through social networks, Adam Kramer,Jamie Guillory & Jeffrey Hancock, in Proceedings of the National Academy of Sciences, v111.24, 17 June 2014.
The resulting scandal has been well-reported by many, including Kashmir Hill in Forbes, whose blog post nicely covers how the affair has unfolded, and includes a response by Adam Kramer himself.
Plenty has been written already about the dodgy (or non-existent) ethics approval, and the entirely contemptible claim that users gave "informed consent" to have their data "used" for research in this way. I draw attention to the fact that consent forms in properly constituted human research experiments are famously thick. They go to great pains to explain what's going on, the possible side effects and potential adverse consequences. The aim of a consent form is to leave the experimental subject in no doubt whatsoever as to what they're signing up for. Contrast this with the Facebook Experiment where they claim informed consent was represented by a fragment of one sentence buried in thousands of words of the data usage agreement. And Kash Hill even proved that the agreement was modified after the experiment started! These are not the actions of researchers with any genuine interest in informed consent.
I was also struck by Adam Kramer's unvarnished description of their motives. His response to the furore (provided by Hill in her blog) is, as she puts it, tone deaf. Kramer makes no attempt whatsoever at a serious scientific justification for this experiment:
- "The reason we did this research is because we care about the emotional impact of Facebook and the people that use our product ... [We] were concerned that exposure to friends’ negativity might lead people to avoid visiting Facebook.
That is, this large scale psychological experiment was simply for product development.
Some apologists for Facebook countered that social network feeds are manipulated all the time, notably by advertisers, to produce emotional responses.
Now that's interesting, because for their A-B experiment, Kramer and his colleagues took great pains to make sure the subjects were unaware of the manipulation. After all, the results would be meaningless if people knew what they were reading had been emotionally fiddled with.
In contrast, the ad industry has always insisted that today's digital consumers are super savvy, and they know the difference between advertising and real-life. Yet the foundation of the Facebook experiment is that users are unaware of how their online experience is being manipulated. The ad industry's illogical propaganda [advertising is just harmless fun, consumers can spot the ads, they're not really affected by ads all that much ... Hey, with a minute] has only been further exposed by the Facebook Experiment.
Advertising companies and Social Networks are increasingly expert at covertly manipulating perceptions, and now they have the data, collected dishonestly, to prove it.
For the past year, oncologists at the Memorial Sloan Kettering Cancer Centre in New York have been training IBM’s Watson – the artificial intelligence tour-de-force that beat allcomers on Jeopardy – to help personalise cancer care. The Centre explains that "combining [their] expertise with the analytical speed of IBM Watson, the tool has the potential to transform how doctors provide individualized cancer treatment plans and to help improve patient outcomes". Others are speculating already that Watson could "soon be the best doctor in the world".
I have no doubt that when Watson and things like it are available online to doctors worldwide, we will see overall improvements in healthcare outcomes, especially in parts of the world now under-serviced by medical specialists [having said that, the value of diagnosing cancer in poor developing nations is questionable if they cannot go on to treat it]. As with Google's self-driving car, we will probably get significant gains eventually, averaged across the population, from replacing humans with machines. Yet some of the foibles of computing are not well known and I think they will lead to surprises.
For all the wondrous gains made in Artificial Intelligence, where Watson now is the state-of-the art, A.I. remains algorithmic, and for that, it has inherent limitations that don't get enough attention. Computer scientists and mathematicians have know for generations that some surprisingly straightforward problems have no algorithmic solution. That is, some tasks cannot be accomplished by any universal step-by-step codified procedure. Examples include the Halting Problem and the Travelling Salesperson Problem. If these simple challenges have no algorithm, we need be more sober in our expectations of computerised intelligence.
A key limitation of any programmed algorithm is that it must make its decisions using a fixed set of inputs that are known and fully characterised (by the programmer) at design time. If you spring an unexpected input on any computer, it can fail, and yet that's what life is all about -- surprises. No mathematician seriously claims that what humans do is somehow magic; most believe we are computers made of meat. Nevertheless, when paradoxes like the Halting Problem abound, we can be sure that computing and cognition are not what they seem. We should hope these conundrums are better understood before putting too much faith in computers doing deep human work.
And yet, predictably, futurists are jumping ahead to imagine "Watson apps" in which patients access the supercomputer for themselves. Even if there were reliable algorithms for doctoring, I reckon the "Watson app" is a giant step, because of the complex way the patient's conditions are assessed and data is gathered for the diagnosis. That is, the taking of the medical history.
In these days of billion dollar investments in electronic health records (EHRs), we tend to think that medical decisions are all about the data. When politicians announce EHR programs they often boast that patients won't have to go through the rigmarole of giving their history over and over again to multiple doctors as they move through an episode of care. This is actually a serious misunderstanding of the importance in clinical decision-making of the interaction between medico and patient when the history is taken. It's subtle. The things a patient chooses to tell, the things they seem to be hiding, and the questions that make them anxious, all guide an experienced medico when taking a history, and provide extra cues (metadata if you will) about the patient’s condition.
Now, Watson may well have the ability to navigate this complexity and conduct a very sophisticated Q&A. It will certainly have a vastly bigger and more reliable memory of cases than any doctor, and with that it can steer a dynamic patient questionnaire. But will Watson be good enough to be made available direct to patients through an app, with no expert human mediation? Or will a host of new input errors result from patients typing their answers into a smart phone or speaking into a microphone, without any face-to-face subtlety (let alone human warmth)? It was true of mainframes and it’s just as true of the best A.I.: Bulldust in, bulldust out.
Finally, Watson's existing linguistic limitations are not to be underestimated. It is surely not trivial that Watson struggles with puns and humour. Futurist Mark Pesce when discussing Watson remarked in passing that scientists don’t understand the "quirks of language and intelligence" that create humour. The question of what makes us laugh does in fact occupy some of the finest minds in cognitive and social science. So we are a long way from being able to mechanise humour. And this matters because for the foreseeable future, it puts a great deal of social intercourse beyond AI's reach.
In between the extremes of laugh-out-loud comedy and a doctor’s dry written notes lies a spectrum of expressive subtleties, like a blush, an uncomfortable laugh, shame, and the humiliation that goes with some patients’ lived experience of illness. Watson may understand the English language, but does it understand people?
Watson can answer questions, but good doctors ask a lot of questions too. When will this amazing computer be able to hold the sort of two-way conversation that we would call a decent "bedside manner"?
Have a disruptive technology implementation story? Get recognised for your leadership. Apply for the 2014 SuperNova Awards for leaders in disruptive technology.
No it doesn't, it only means the end of anonymity.
Anonymity is not the same thing as privacy. Anonymity keeps people from knowing what you're doing, and it's a vitally important quality in many settings. But in general we usually want people (at least some people) to know what we're up to, so long as they respect that knowledge. That's what privacy is all about. Anonymity is a terribly blunt instrument for protecting privacy, and it's also fragile. If anonymity was all you have, then you're in deep trouble when someone manages to defeat it.
New information technologies have clearly made anonymity more difficult, yet it does not follow that we must lose our privacy. Instead, these developments bring into stark relief the need for stronger regulatory controls that compel restraint in the way third parties deal with Personal Information that comes into their possession.
A great example is Facebook's use of facial recognition. When Facebook members innocently tag one another in photos, Facebook creates biometric templates with which it then automatically processes all photo data (previously anonymous), looking for matches. This is how they can create tag suggestions, but Facebook is notoriously silent on what other applications it has for facial recognition. Now and then we get a hint, with, for example, news of the Facedeals start up last year. Facedeals accesses Facebook's templates (under conditions that remain unclear) and uses them to spot customers as they enter a store to automatically check them in. It's classic social technology: kinda sexy, kinda creepy, but clearly in breach of Collection, Use and Disclosure privacy principles.
And indeed, European regulators have found that Facebook's facial recognition program is unlawful. The chief problem is that Facebook never properly disclosed to members what goes on when they tag one another, and they never sought consent to create biometric templates with which to subsequently identify people throughout their vast image stockpiles. Facebook has been forced to shut down their facial recognition operations in Europe, and they've destroyed their historical biometric data.
So privacy regulators in many parts of the world have real teeth. They have proven that re-identification of anonymous data by facial recognition is unlawful, and they have managed to stop a very big and powerful company from doing it.
This is how we should look at the implications of the DNA 'hacking'. Indeed, Melissa Gymrek from the Whitehead Institute said in an interview: "I think we really need to learn to deal with the fact that we cannot ever make data sets truly anonymous, and that I think the key will be in regulating how we are allowed to use this genetic data to prevent it from being used maliciously."
Perhaps this episode will bring even more attention to the problem in the USA, and further embolden regulators to enact broader privacy protections there. Perhaps the very extremeness of the DNA hacking does not spell the end of privacy so much as its beginning.
I had a letter published in Science magazine about the recently publicised re-identification of anonymously donated DNA data. It has been shown that there is enough named genetic information online, in genealogical databases for instance, that anonymous DNA posted in research databases can be re-identified. This is a sobering result indeed. But does it mean that 'privacy is dead'?
No. The fact is that re-identification of erstwhile anonymous data represents an act of collection of PII and is subject to the Collection Limitation Principle in privacy law around the world. This is essentially the same scenario as Facebook using biometric facial recognition to identify people in photos. European regulators recently found Facebook to have breached privacy law and have forced Facebook to shut down their facial recognition feature.
I expect that the very same legal powers will permit regulators to sanction the re-identification of DNA. There are legal constraints on what can be done with 'anonymous' data no matter where you get it from: under some data privacy laws, attaching names to such data constitutes a Collection of PII, and as such, is subject to consent rules and all sorts of other principles. As a result, bioinformatics researchers will have to tread carefully, justifying their ends and their means before ethics committees. And corporations who seek to exploit the ability to put names on anonymous genetic data may face the force of the law as Facebook did.
To summarise: Let's assume Subject S donates their DNA, ostensibly anonymously, to a Researcher R1, under some consent arrangement which concedes there is a possibility that S will be re-identified. And indeed, some time later, an independent researcher R2 does identify S as belonging to the DNA sample. The fact that many commentators seem oblivious to is this: R2 has Collected Personal Information (or PII) about S. If R2 has no relationship with S, then S has not consented to this new collection of her PII. In jurisdictions with strict Collection Limitation (like the EU, Australia and elsewhere) then it seems to me to be a legal privacy breach for R2 to collect PII by way of DNA re-identification without express consent, regardless of whether R1 has conceded to S that it might happen. Even in the US, where the protections might not be so strict, there remains a question of ethics: should R2 conduct themselves in a manner that might be unlawful in other places?
The text of my letter to Science follows, and after that, I'll keep posting follow ups.
Science 8 February 2013:
Vol. 339 no. 6120 pp. 647
Yaniv Erlich at the Whitehead Institute for Biomedical Research used his hacking skills to decipher the names of anonymous DNA donors ("Genealogy databases enable naming of anonymous DNA donor," J. Bohannon, 18 January, p. 262). A little-known legal technicality in international data privacy laws could curb the privacy threats of reverse identification from genomes. "Personal information" is usually defined as any data relating to an individual whose identity is readily apparent from the data. The OECD Privacy Principles are enacted in over 80 countries worldwide . Privacy Principle No. 1 states: "There should be limits to the collection of personal data and any such data should be obtained by lawful and fair means and, where appropriate, with the knowledge or consent of the data subject." The principle is neutral regarding the manner of collection. Personal information may be collected directly from an individual or indirectly from third parties, or it may be synthesized from other sources, as with "data mining."
Computer scientists and engineers often don't know that recording a person's name against erstwhile anonymous data is technically an act of collection. Even if the consent form signed at the time of the original collection includes a disclaimer that absolute anonymity cannot be guaranteed, re-identifying the information later signifies a new collection. The new collection of personal information requires its own consent; the original disclaimer does not apply when third parties take data and process it beyond the original purpose for collection. Educating those with this capability about the legal meaning of collection should restrain the misuse of DNA data, at least in those jurisdictions that strive to enforce the OECD principles.
It also implies that bioinformaticians working "with little more than the Internet" to attach names to samples may need ethics approval, just as they would if they were taking fresh samples from the people concerned.
Lockstep Consulting Pty Ltd
Five Dock Sydney, NSW 2046, Australia.
In an interview with Science Magazine on Jan 18, the Whitehead Institute's Melissa Gymrek discussed the re-identification methods, and the potential to protect against them. She concluded: "I think we really need to learn to deal with the fact that we cannot ever make data sets truly anonymous, and that I think the key will be in regulating how we are allowed to use this genetic data to prevent it from being used maliciously.".
I agree completely. We need regulations. Elsewhere I've argued that anonymity is an inadequate way to protect privacy, and that we need a balance of regulations and Privacy Enhancing Technologies. And it's for this reason that I am not fatalistic about the fact that anonymity can be broken, because we have the procedural means to see that privacy is still preserved.
This blog post builds a little further on my ecological ideas about the state of digital identity, first presented at the AusCERT 2011 conference. I have submitted a fresh RSAC 2013 speaker proposal where I hope to show a much more fully developed memetic model.
The past twenty years has seen a great variety of identity methods and devices emerge in the digital marketplace. In parallel, Internet business in many sectors has developed under the existing metasystems of laws, sectoral regulations, commercial contracts, industry codes, and traditional risk management arrangements.
As with Darwin's finches, the very variety of identity methods suggests an ecological explanation. It seems most likely that different methods have evolved in response to different environmental pressures.
The orthodox view today is that we are given a plurality of identities from the many organisations we do business with. Our bank account is thought to be an discrete identity, as is our employment, our studentship, our membership of a professional body, and our belonging to a social network. Identity federation seeks to take an identity out of its original context, and present it in another, so that we can strike up new relationships without having to repeat the enrolment processes. But in practice, established identities are brittle; they don't bend easily to new uses unanticipated by their original issuers. Even superficially similar identities are not readily relied upon, because of the contractual fine print. Famously in Australia, one cannot open a fresh bank account on the basis of having an existing accout at another bank, even though their identification protocols are essentially identical, under the law. Similarly, government agencies have historically struggled to cross-recognise each other's security clearances.
I have come to the conclusion that we have abstracted "identity" at too high a level. We need to drop down a level or two and make smarter use of how identities are constructed. It shouldn't be hard to do; we have a lot of the conceptual apparatus already. In particular, one of the better definitions of digital identity holds that it is a set of assertions or claims [Ref:The Laws of Identity]. Instead of federating rolled-up high level identities, we would have an easier time federating selected assertions.
Now, generalising beyond the claims and assertions, consider that each digital identity is built from a broad ensemble of discrete technological and procedural traits, spanning such matters as security techniques, registration processes, activation processes, identity proofing requirements (which are regulated in some industries like banking and the healthcare professions), user interface, algorithms, key lengths, liability arrangements, and so on. These traits together with the overt identity assertions -- like date of birth, home address and social security number -- can be seen as memes: heritable units of business and technological "culture".
The ecological frame leads us to ask: where did these traits come from? What forces acted upon the constituent identity memes to create the forms we see today? Well, we can see that different selection pressures operate in different business environments, and that memes evolve over time in response. Example of selection pressures include fraud, privacy (with distinct pressures to both strengthen and weaken privacy playing out before our eyes), convenience, accessibility, regulations (like Basel II, banking KYC rules, medical credentialing rules, and HSPD-12), professional standards, and new business models like branch-less banking and associated Electronic Verification of Identity. Each of these factors shift over time, usually moving in and out of equilibrium with other forces, and the memes shift too. Successful memes -- where success means that some characteristic like key length or number of authentication factors has proven effective in reducing risk -- are passed on to successive generations of identity solution. The result is that at any time, the ensemble of traits that make up an "identity" in a certain context represents the most efficient way to manage misidentification risks.
The "memome" of any given rolled-up identity -- like a banking relationship for instance -- is built from all sorts of ways doing things, as illustrated. We can have different ways of registering new banking customers, checking their bona fides, storing their IDs, and activating their authenticators. Over time, these component memes develop in different ways, usually gradually, as the business environment changes, but sometimes in sudden step-changes when the environment is occasionally disrupted by e.g. a presidential security directive, or a w business model like branch-less banking. And as with real genomes, identity memes interact, changing how they are expressed, even switching each other on and off.
As they say, things are the way they are because they got that way.
I reckon before we try to make identities work across contexts they were not originally intended for, we need to first understand the evolutionary back stories for each of the identity memes, and the forces that shaped them to fit certain niches in business ecosystems. Then we may be able to literally do memetic engineering to adapt a core set of relationships to new business settings.
The next step is to rigorously document some of the back stories, and to see if the "phylomemetics" really hangs together.
Quantum computing continues to make strides. Now they've made a chip to execute Shor's quantum foctorisation algorithm. Until now, quantum computers were built from bench-loads of apparatus, and had yet to be fabricated in solid state. So this is pretty cool, taking QC from science into engineering.
The promise of quantum computing is that it will eventually render today's core cryptography obsolete, by making it possible to factorise large numbers very quickly. The RSA algorithm for now is effectively unbreakable because its keys are the products of prime numbers hundreds of digits long. The product of two primes can be computed in split seconds; but to find the factors by brute force - and thus crack the code - takes billions of computer-years.
I'm curious about one thing. Current prototype quantum computers are built with just a few qubits because of the "coherence" problem (so they can only factorise little numbers like 15 = 3 x 5). The machinery has to hold all the qubits in a delicate state of quantum uncertainty for long enough to complete the computation. The more qubits they have, the harder it is to maintain coherence. The task ahead is to scale up past the proof-of-concept stage to a manage quite a lot of qubits and thus be able to crack 4096-bit RSA keys for instance.
Evidently it's very hard to build say a 100 qubit quantum computer right now. So my question is: What is the relationship between the difficulty of maintaining coherence and the number of qubits concerned? Is it exponentially difficult? Or worse?
Because if it is, then the way to stay ahead of quantum computing attack might be to simply go out to RSA keys tens of thousands of digits long.
If engineering quantum computers is inherently difficult (as is miniaturizing transistors), then no matter how good we get at it, arbitrarily long qubit computations will remain costly, and the arms race between cryptographic key length and brute force attack may continue indefinitely. The end of conventional encryption is not yet in sight.
These days it’s common to hear the modest disclaimer that there are some questions science can’t answer. I most recently came across such a show of humility by Dr John Kirk speaking on ABC Radio National’s Ockham’s Razor . Kirk says that “science cannot adjudicate between theism and atheism” and insists that science cannot bridge the divide between physics and metaphysics. Yet surely the long history of science shows that divide is not hard and fast.
Science is not merely about the particular answers; it’s about the steady campaign on all that is knowable.
Science demystifies. Way before having all the detailed answers, each fresh scientific wave works to banish the mysterious, that which previously lay beyond human comprehension.
Textbook examples are legion where new sciences have rendered previously fearsome phenomena as firstly explicable and then often manageable: astronomy, physiology, meteorology, sedimentology, seismology, microbiology, psychology and neurology, to name a few.
It's sometimes said that in science, the questions matter more than the answers. Good scientists ask good questions, but great ones show where there is no question anymore.
Once something profound is no longer beyond understanding, that awareness permeates society. Each wave of scientific advance is usually signaled by new technologies, but more vital to the human condition is that science gives us confidence. In an enlightened society, those with no scientific training at all still appreciate that science gets how the world works. Over time this tacit rational confidence has energised modernity, supplanting astrologers, shamans, witch doctors, and even the churches. Laypeople may not know how televisions work, nor nuclear medicine, semiconductors, anaesthetics, antibiotics or fibre optics, but they sure know it’s not by magic.
The arc of science parts mystery’s curtain. Contrary to John Kirk's partitions, science frequently renders the metaphysical as natural and empirically knowable. My favorite example: To the pre-Copernican mind, the Sun was perfect and ethereal, but when Galileo trained his new telescope upon it, he saw spots. These imperfections were shocking enough, but the real paradigm shift came when Galileo observed the sunspots to move across the face, disappear and then return hours later on the other limb. Thus the Sun was shown―in what must have truly been a heart-stopping epiphany―to be a sphere turning on its axis: geometric, humble, altogether of this world, and very reasonably the centre of a solar system as Copernicus had reasoned a few decades earlier. This was science exercising its most profound power, titrating the metaphysical.
An even more dramatic turn was Darwin's discovery that all the world’s living complexity was explicable without god. He thus dispelled teleology (the search for ultimate reason). He not only neutralised the Argument from Design for the existence of god, but also the very need for god. The deepest lesson of Darwinism is that there is simply no need to ask "What am I doing here?" because the wondrous complexity of all of biology, including humanity's own existence are seen to have arisen through natural selection, without a designer, and moreover, without a reason. Darwin himself felt keenly the gravity of this outcome and what it would mean to his deeply religious wife, and for that reason he kept his work secret for so long. It seems philosophers appreciate the deep lessons of Darwinism more than our modest scientists: Karl Marx saw that evolution “deals the death-blow to teleology” and Frederich Nietzsche claimed “God is dead ... we have killed him”.
So why shouldn’t we expect science to continue? Why should we doubt ― or perhaps fear ― its power to remove all mystery? Of course many remaining riddles are very hard indeed, and I know there’s no guarantee science will be able to solve them. But I don't see the logic of rejecting the possibility that it will. Some physicists feel they’re homing in why the physical constants should have their special values. And many cognitive scientists and philosophers of the mind suspect a theory of consciousness is within reach. I’m not saying anyone yet really gets consciousness yet, but surely most would agree that it just doesn’t feel a total enigma anymore.
Science is more than the books it produces. It’s the power to keep writing new ones.
References. “Why is science such a worry?” Ockham's Razor 18 December 2011 http://www.abc.net.au/radionational/programs/ockhamsrazor/ockham27s-razor-18-december-2011/3725968
I’ve always been uneasy about the term “ecosystem” being coopted when what is really meant is “marketplace”. There’s nothing wrong with marketplaces! They are where we technologists study problems and customer needs, launch our solutions, and jockey for share. The suddenly-orthodox “Identity Ecosystem” as expressed in the NSTIC is an elaborate IT architecture, that defines specific roles for users, identity providers, relying parties and other players. And the proposed system is not yet complete; many commentators have anticipated that NSTIC may necessitate new legislation to allocate liability.
So it’s really not an "ecosystem”. True ecosystems evolve; they are not designed. If NSTIC isn’t even complete, then badging it an “ecosystem” seems more marketing than ecology; it’s an effort to raise NSTIC as a policy initiative above the hurly burly of competitive IT.
My unease about “ecosystem” got me thinking about ecology, and whether genuine ecological thinking could be useful to analyse the digital identity environment. I believe there is potential here for a new and more powerful way to frame the identity problem.
We are surrounded by mature business ecosystems, in which different sectors and communities-of-interest, large and small, have established their own specific arrangements for managing risk. A special part of risk management is the way in which customers, users, members or business partners are identified. There is almost always a formal protocol by which an individual joins one of these communities-of-interest; that is, when they join a company, qualify for a professional qualification, or open a credit account. Some of these registration protocols are set freely by employers, merchants, associations and the like; others have a legislated element, in regulated industries like aviation, healthcare and finance; and in some cases registration is almost trivial, as when you sign up to a blog site. The conventions, rules, professional charters, contracts, laws and regulations that govern how people do business in different contexts are types of memes. They have evolved in different contexts to minimise risk, and have literally been passed on from one generation to another.
As business environments change, risk management rules in response change too. And so registration processes are subject to natural selection. An ecological treatment of identity recognises that “selection pressures” act on those rules. For instance, to deal with increasing money laundering and terrorist financing, many prudential regulators have tightened the requirements for account opening. To deal with ID theft and account takeover, banks have augmented their account numbers with Two Factor Authentication. The US government’s PIV-I rules for employees and contractors were a response to Homeland Security Presidential Directive HSPD-12. Cell phone operators and airlines likewise now require extra proof of ID. Medical malpractice in various places has led hospitals to tighten their background checks on new staff.
It's natural and valuable, up to a point, to describe "identities" being provided to people acting in these different contexts. This abstraction is central to the Laws of Identity. Unfortunately the word identity is suggestive of a sort of magic property that can be taken out of one context and used in another. So despite the careful framing of the Laws of Identity, it seems that people still carry around a somewhat utopian idea of digital identity, and a tacit belief in the possibility of a universal digital passport (or at least a greatly reduced number of personal IDs). I have argued elsewhere that the passport is actually implausible. If I am right about that, then what is sorely needed next is a better frame for understanding digital identity, a perspective that helps people stay away of the dangerous temptation of a single passport, and to understand that a plurality of identities is the natural state of being.
All modern identity thinking recognises that digital identities are context dependent. Yet it seems that federated identity projects have repeatedly underestimated the strength of that dependence. The federated identity movement is based on an optimism that we can change context for the IDs we have now, and still preserve some recognisable and reusable core identity. Or alternatively, create a new smaller set of IDs that will be useful for transacting with a superset of services. Such “interoperability” has only been demonstrated to date in near-trivial use cases like logging onto blog sites with unverified OpenIDs, or Facebook or Twitter handles. More sophisticated re-use of identities across context – such as the Australian banking sector’s ill-fated Trust Centre project – have foundered, even when there is pre-existing common ground in identification protocols.
The greatest challenge in federated identity is getting service and identity providers that are accustomed to operating in their own silos, to accept risks arising from identification and/or authentication performed on/by their members in other silos. This is where the term “identity” can be distracting. It is important to remember that the “identities” issued by banks, government agencies, universities, cell phone companies, merchants, social networks and blog sites are really proxies for the arrangements to which members have signed up.
This is why identity is so very context dependent, and so why some identities are so tricky to federate.
If we think ecologically, then a better word for the “context” that an identity operates in may be niche. This term properly evokes the tight evolved fit between an identity and the setting in which it is meaningful. In most cases, if we want to understand the natural history of identities, we can look to the existing business ecosystem from where they came. The environmental conditions that shaped the particular identities issued by banks, credit card companies, employers, governments, professional bodies etc. are not fundamentally changed by the Internet. As such, we should expect that when these identities transition from real world to digital, their properties – especially their “interoperability” and liability arrangements -- cannot change a great deal. It is only the pure cyber identities like blogger names, OSN handles and gaming avatars that are highly malleable, because their environmental niches are not so specific.
As an aside, noting how they spread far and wide, and too quickly for us to predict the impacts, maybe it's accurate to liken OpenID and Facebook Connect to weeds!
A lot more work needs to be done on this ecological frame, but I’m thinking we need a new name, distinct from "federated" identity. It seems to me that most business, whether it be online or off, turns on Evolved Identities. If we appreciate the natural history of identities as having evolved, then we should have more success with digital identity. It will become clearer which identities can federate easily, and which cannot, because of their roots in real world business ecosystems.
Taking a digital identity (like a cell phone account) out of its natural niche and hoping it will interoperate in another niche (like banking) can be compared to taking a tropical salt water fish and dropping it into a fresh water tank. If NSTIC is an ecosystem, it is artificial. As such it may be as fragile as an exotic botanic garden or tropical aquarium. I fear that full blown federated identity systems are going to need constant care and intervention to save them from collapse.
Astronomy is the archetypal pure science. It can seem frankly pointless, especially when the projects like the Hubble Space Telescope cost a billion dollars or more. The point of astronomy is a vexed question, and one that politicians often have to engage with.
Some pundits justify astronomy on the grounds of the spinoffs, like better radio antenna technology. Some say our destiny is to emigrate from the planet, so we'd better start somewhere. Others simply assert unapologetically that peering into the heavens is what homo sapiens does, at any cost.
Here's a different rationale. To my mind (as an ex astronomer) the deepest practical value of astronomy is it enables us to perform experiments that are just too grand to ever be done on earth. The limits of earth-bound experiments of course change over time, but during each era, astronomy has furnished answers to questions that elude terrestrial investigation.
For example, astronomers were the first to:
And there must be other examples.
I mention General Relativity, which was another one of those apparently academic pursuits, until it came to the rescue a few years ago in the most practical way. Soon after the Global Positioning Satellite (GPS) system came online, its results started drifting. Engineers realised quickly that the high precision clocks in orbit were getting out of sync with those on the ground. And the explanation turned out to be gravity. According to General Relativity, a clock will run more slowly in a gravitational field, and because the force of gravity is slightly lower above the Earth than on the surface, the GPS clocks were running faster than expected. The effect was just a few parts in a billion, but enough to cause the positioning results to drift by a few metres, and whatsmore, to get worse over time. By reprogramming the GPS controllers to account for gravitational time dilation the problems were solved and the system has been stable ever since. So if it wasn't for Einstein, your sat nav wouldn't work, and you'd be lost. Or rather, more lost than you are now.
So astronomy is supremely practical. And here's another thing. Astronomy occasionally provides the most profound and compelling truths about reality. My favorite example is a classic gooose-bump moment in the history of science: Galileo's discovery of sunspots and his appreciation of what they meant. Until then, everyone thought the Sun was a perfect unchanging disc of light. After projecting the Sun through his newly uinvented teleschope onto a white sheet, Galileo soon noticed blemishes which rather put paid to the perfection. But much more importantly, Galileo saw that the spots were moving. They drifted across the face of the disc over a few hours, disappeared, and then returned, on the other side where they began. In today's idiom, he might have said "O.M.F.G!" The sun turns out to be a turning ball! It was a critical moment of unification in human culture, contributing more evidence to the realisation that all of the things and all of the stuff in the universe are fundamentally the same. The Coppernican revolution was much more than star gazing: it reset humankind's understanding of the mystical, reinforcing that everything is ordered, and ordinary. Corporeal, and explicable to human minds.
It's not the only time a radical, instant, disruptive re-framing of our place in the world has been delivered by astronomy. It wasn't really so long ago -- in the early 20th century -- that Edwin Hubble and his peers established that the 'nebulae' were actually galaxies just like the Milky Way, and that the universe therefore had to be tens of millions of times bigger than previously thought possible.
This kind of revelation about our place in the scheme of things only comes from astronomy.
Update September 2012
The recent discovery that junk DNA is not actually junk rather reinforces my long standing thesis, espoused below, that we don't know enough about how genes work to be able to validate genetic engineering artifacts by testing alone. I point out that computer programs are only validated by a mixture of testing, code inspection and theory, all of which is based on knowing how the code works at the instruction level. But we don't have a terribly complete picture of how genes interact. We always knew they were massively parallel, and now it turns out that junk DNA has some sort of role in gene expression across the whole of the genome, raising the combinatorial complexity enormously. This tells me that we have little idea how modifications at one point in the genome can impact the functioning at any number of other points (but it hints at an explanation as to why human beings are so much more complex than nematodes despite having only a relatively small number of additional raw genetic instructions).
And now there is news that a cow in New Zealand, genetically engineered in respect of one allergenic protein, was born with no tail. Now it's too early to be able to blame the GM for this oddity, but equally, the junk DNA finding surely undermines the confidence that any genetic engineer can have in predicting that their changes cannot have had unexpected and really unpredictable side effects.
Original post, 15 Jan 2011
As a software engineer years ago I developed a deep unease about genetic engineering and genetically modified organisms (GM). The software experience suggests to me that GM products cannot be verifiable given the state of our knowledge about how genes work. I’d like to share my thoughts.
Genetic engineering proponents seem to believe the entire proof of a GM pudding is in the eating. That is, if trials show that GM food is not toxic, then it must be safe, and there isn't anything else to worry about. The lesson I want others to draw from the still new discipline of software engineering is there is more to the verification of correctness in complex programs than tesing the end product.
Recently I’ve come across an Australian government-sponsored FAQ Arguments for and against gene technology (May 2010) that supposedly provides a balanced view of both sides of the GM debate. Yet it sweeps important questions under the rug.[At one point the paper invites readers to think about whether agriculture is natural. It’s a highly loaded question grounded in the soothing proposition that GM is simply an extension of the age old artificial selection that gave us wheat, Merinos and all those different potatoes. The question glosses over the fact that when genes recombine under normal sexual reproduction, cellular mechanisms constrain where each gene can end up, and most mutations are still-born. GM is not constrained; it jumps levels. It is quite unlike any breeding that has gone before.]
Genes are very frequently compared with computer software, for good reason. I urge that the comparison be examined more closely, so that lessons can be drawn from the long standing “Software Crisis”.
Each gene codes for a specific protein. That much we know. Less clear is how relatively few genes -- 20,000 for a nematode; 25,000 for a human being -- can specify an entire complex organism. Science is a long way from properly understanding how genes specify bodies, but it is clear that each genome is an immensely intricate ensemble of interconnected biochemical short stories. We know that genes interact with each other, turning each other on and off, and more subtly influencing how each is expressed. In software parlance, genetic codes are executed in a massively parallel manner. This combinatorial complexity is probably why I can share fully half of my genes with a turnip, and have an “executable file” in DNA that is only 20% longer than that of a worm, and yet I can be so incredibly different from those organisms.
If genomes are like programs then let’s remember they have been written achingly slowly over eons, to suit the circumstances of a species. Genomes are revised in a real world laboratory over billions of iterations and test cases, to a level of confidence that software engineers can’t even dream of. Brassica napus.exe (i.e. canola) is at v1000000000.1. Tinkering with isolated parts of this machinery, as if it were merely some sort of wiki with articles open to anyone to edit, could have consequences we are utterly unable to predict.
In software engineering, it is received wisdom that most bugs result from imprudent changes made to existing programs. Furthermore, editing one part of a program can have unpredictable and unbounded impacts on any other part of the code. Above all else, all but the very simplest software in practice is untestable. So mission critical software (like the implantable defibrillator code I used to work on) is always verified by a combination of methods, including unit testing, system testing, design review and painstaking code inspection. Because most problems come from human error, software excellence demands formal design and development processes, and high level programming languages, to preclude subtle errors that no amount of testing could ever hope to find.
How many of these software quality mechanisms are available to genetic engineers? Code inspection is moot when we don’t even know how genes normally interact with one another; how can we possibly tell by inspection if an artificial gene will interfere with the “legacy” code?
What about the engineering process? It seems to me that GM is akin to assembly programming circa 1960s. The state-of-the-art in genetic engineering is nowhere near even Fortran, let alone modern object oriented languages.
Can today’s genetic engineers demonstrate a rigorous verification regime, given the reality that complex software programs are inherently untestable?
We should pay much closer attention to the genes-as-software analogy. Some fear GM products because they are unnatural; others because they are dominated by big business and a mad rush to market. I simply say let’s slow down until we’re sure we know what we're doing.