Lockstep

Mobile: +61 (0) 414 488 851
Email: swilson@lockstep.com.au

The beginning of privacy!

The headlines proclaim that the newfound ability to re-identify anonymous DNA donors means The End Of Privacy!.

No it doesn't, it only means the end of anonymity.

Anonymity is not the same thing as privacy. Anonymity keeps people from knowing what you're doing, and it's a vitally important quality in many settings. But in general we usually want people (at least some people) to know what we're up to, so long as they respect that knowledge. That's what privacy is all about. Anonymity is a terribly blunt instrument for protecting privacy, and it's also fragile. If anonymity was all you have, then you're in deep trouble when someone manages to defeat it.

New information technologies have clearly made anonymity more difficult, yet it does not follow that we must lose our privacy. Instead, these developments bring into stark relief the need for stronger regulatory controls that compel restraint in the way third parties deal with Personal Information that comes into their possession.

A great example is Facebook's use of facial recognition. When Facebook members innocently tag one another in photos, Facebook creates biometric templates with which it then automatically processes all photo data (previously anonymous), looking for matches. This is how they can create tag suggestions, but Facebook is notoriously silent on what other applications it has for facial recognition. Now and then we get a hint, with, for example, news of the Facedeals start up last year. Facedeals accesses Facebook's templates (under conditions that remain unclear) and uses them to spot customers as they enter a store to automatically check them in. It's classic social technology: kinda sexy, kinda creepy, but clearly in breach of Collection, Use and Disclosure privacy principles.

And indeed, European regulators have found that Facebook's facial recognition program is unlawful. The chief problem is that Facebook never properly disclosed to members what goes on when they tag one another, and they never sought consent to create biometric templates with which to subsequently identify people throughout their vast image stockpiles. Facebook has been forced to shut down their facial recognition operations in Europe, and they've destroyed their historical biometric data.

So privacy regulators in many parts of the world have real teeth. They have proven that re-identification of anonymous data by facial recognition is unlawful, and they have managed to stop a very big and powerful company from doing it.

This is how we should look at the implications of the DNA 'hacking'. Indeed, Melissa Gymrek from the Whitehead Institute said in an interview: "I think we really need to learn to deal with the fact that we cannot ever make data sets truly anonymous, and that I think the key will be in regulating how we are allowed to use this genetic data to prevent it from being used maliciously."

Perhaps this episode will bring even more attention to the problem in the USA, and further embolden regulators to enact broader privacy protections there. Perhaps the very extremeness of the DNA hacking does not spell the end of privacy so much as its beginning.

Posted in Social Media, Science, Privacy, Biometrics, Big Data

Letter to Science: Re-identification of DNA may need ethics approval

Introduction

I had a letter published in Science magazine about the recently publicised re-identification of anonymously donated DNA data. It has been shown that there is enough named genetic information online, in genealogical databases for instance, that anonymous DNA posted in research databases can be re-identified. This is a sobering result indeed. But does it mean that 'privacy is dead'?

No. The fact is that re-identification of erstwhile anonymous data represents an act of collection of PII and is subject to the Collection Limitation Principle in privacy law around the world. This is essentially the same scenario as Facebook using biometric facial recognition to identify people in photos. European regulators recently found Facebook to have breached privacy law and have forced Facebook to shut down their facial recognition feature.

I expect that the very same legal powers will permit regulators to sanction the re-identification of DNA. There are legal constraints on what can be done with 'anonymous' data no matter where you get it from: under some data privacy laws, attaching names to such data constitutes a Collection of PII, and as such, is subject to consent rules and all sorts of other principles. As a result, bioinformatics researchers will have to tread carefully, justifying their ends and their means before ethics committees. And corporations who seek to exploit the ability to put names on anonymous genetic data may face the force of the law as Facebook did.

To summarise: Let's assume Subject S donates their DNA, ostensibly anonymously, to a Researcher R1, under some consent arrangement which concedes there is a possibility that S will be re-identified. And indeed, some time later, an independent researcher R2 does identify S as belonging to the DNA sample. The fact that many commentators seem oblivious to is this: R2 has Collected Personal Information (or PII) about S. If R2 has no relationship with S, then S has not consented to this new collection of her PII. In jurisdictions with strict Collection Limitation (like the EU, Australia and elsewhere) then it seems to me to be a legal privacy breach for R2 to collect PII by way of DNA re-identification without express consent, regardless of whether R1 has conceded to S that it might happen. Even in the US, where the protections might not be so strict, there remains a question of ethics: should R2 conduct themselves in a manner that might be unlawful in other places?

The text of my letter to Science follows, and after that, I'll keep posting follow ups.

Legal Limits to Data Re-Identification

Science 8 February 2013:
Vol. 339 no. 6120 pp. 647

Yaniv Erlich at the Whitehead Institute for Biomedical Research used his hacking skills to decipher the names of anonymous DNA donors ("Genealogy databases enable naming of anonymous DNA donor," J. Bohannon, 18 January, p. 262). A little-known legal technicality in international data privacy laws could curb the privacy threats of reverse identification from genomes. "Personal information" is usually defined as any data relating to an individual whose identity is readily apparent from the data. The OECD Privacy Principles are enacted in over 80 countries worldwide [1]. Privacy Principle No. 1 states: "There should be limits to the collection of personal data and any such data should be obtained by lawful and fair means and, where appropriate, with the knowledge or consent of the data subject." The principle is neutral regarding the manner of collection. Personal information may be collected directly from an individual or indirectly from third parties, or it may be synthesized from other sources, as with "data mining."

Computer scientists and engineers often don't know that recording a person's name against erstwhile anonymous data is technically an act of collection. Even if the consent form signed at the time of the original collection includes a disclaimer that absolute anonymity cannot be guaranteed, re-identifying the information later signifies a new collection. The new collection of personal information requires its own consent; the original disclaimer does not apply when third parties take data and process it beyond the original purpose for collection. Educating those with this capability about the legal meaning of collection should restrain the misuse of DNA data, at least in those jurisdictions that strive to enforce the OECD principles.

It also implies that bioinformaticians working "with little more than the Internet" to attach names to samples may need ethics approval, just as they would if they were taking fresh samples from the people concerned.

Stephen Wilson
Lockstep Consulting Pty Ltd
Five Dock Sydney, NSW 2046, Australia.
E-mail: swilson@lockstep.com.au

[1] Graham Greenleaf Global data privacy laws: 89 countries, and accelerating Privacy Laws & Business International Report, Issue 115, Special Supplement, February 2012.

Followup

In an interview with Science Magazine on Jan 18, the Whitehead Institute's Melissa Gymrek discussed the re-identification methods, and the potential to protect against them. She concluded: "I think we really need to learn to deal with the fact that we cannot ever make data sets truly anonymous, and that I think the key will be in regulating how we are allowed to use this genetic data to prevent it from being used maliciously.".

I agree completely. We need regulations. Elsewhere I've argued that anonymity is an inadequate way to protect privacy, and that we need a balance of regulations and Privacy Enhancing Technologies. And it's for this reason that I am not fatalistic about the fact that anonymity can be broken, because we have the procedural means to see that privacy is still preserved.

Posted in Science, Privacy, Big Data

Memetic engineering our identities

This blog post builds a little further on my ecological ideas about the state of digital identity, first presented at the AusCERT 2011 conference. I have submitted a fresh RSAC 2013 speaker proposal where I hope to show a much more fully developed memetic model.

The past twenty years has seen a great variety of identity methods and devices emerge in the digital marketplace. In parallel, Internet business in many sectors has developed under the existing metasystems of laws, sectoral regulations, commercial contracts, industry codes, and traditional risk management arrangements.

Variety of identities 12824


As with Darwin's finches, the very variety of identity methods suggests an ecological explanation. It seems most likely that different methods have evolved in response to different environmental pressures.

The orthodox view today is that we are given a plurality of identities from the many organisations we do business with. Our bank account is thought to be an discrete identity, as is our employment, our studentship, our membership of a professional body, and our belonging to a social network. Identity federation seeks to take an identity out of its original context, and present it in another, so that we can strike up new relationships without having to repeat the enrolment processes. But in practice, established identities are brittle; they don't bend easily to new uses unanticipated by their original issuers. Even superficially similar identities are not readily relied upon, because of the contractual fine print. Famously in Australia, one cannot open a fresh bank account on the basis of having an existing accout at another bank, even though their identification protocols are essentially identical, under the law. Similarly, government agencies have historically struggled to cross-recognise each other's security clearances.

I have come to the conclusion that we have abstracted "identity" at too high a level. We need to drop down a level or two and make smarter use of how identities are constructed. It shouldn't be hard to do; we have a lot of the conceptual apparatus already. In particular, one of the better definitions of digital identity holds that it is a set of assertions or claims [Ref:The Laws of Identity]. Instead of federating rolled-up high level identities, we would have an easier time federating selected assertions.

Now, generalising beyond the claims and assertions, consider that each digital identity is built from a broad ensemble of discrete technological and procedural traits, spanning such matters as security techniques, registration processes, activation processes, identity proofing requirements (which are regulated in some industries like banking and the healthcare professions), user interface, algorithms, key lengths, liability arrangements, and so on. These traits together with the overt identity assertions -- like date of birth, home address and social security number -- can be seen as memes: heritable units of business and technological "culture".

IEEE Part B Diagram (2 0)


The ecological frame leads us to ask: where did these traits come from? What forces acted upon the constituent identity memes to create the forms we see today? Well, we can see that different selection pressures operate in different business environments, and that memes evolve over time in response. Example of selection pressures include fraud, privacy (with distinct pressures to both strengthen and weaken privacy playing out before our eyes), convenience, accessibility, regulations (like Basel II, banking KYC rules, medical credentialing rules, and HSPD-12), professional standards, and new business models like branch-less banking and associated Electronic Verification of Identity. Each of these factors shift over time, usually moving in and out of equilibrium with other forces, and the memes shift too. Successful memes -- where success means that some characteristic like key length or number of authentication factors has proven effective in reducing risk -- are passed on to successive generations of identity solution. The result is that at any time, the ensemble of traits that make up an "identity" in a certain context represents the most efficient way to manage misidentification risks.

The "memome" of any given rolled-up identity -- like a banking relationship for instance -- is built from all sorts of ways doing things, as illustrated. We can have different ways of registering new banking customers, checking their bona fides, storing their IDs, and activating their authenticators. Over time, these component memes develop in different ways, usually gradually, as the business environment changes, but sometimes in sudden step-changes when the environment is occasionally disrupted by e.g. a presidential security directive, or a w business model like branch-less banking. And as with real genomes, identity memes interact, changing how they are expressed, even switching each other on and off.

As they say, things are the way they are because they got that way.

I reckon before we try to make identities work across contexts they were not originally intended for, we need to first understand the evolutionary back stories for each of the identity memes, and the forces that shaped them to fit certain niches in business ecosystems. Then we may be able to literally do memetic engineering to adapt a core set of relationships to new business settings.

The next step is to rigorously document some of the back stories, and to see if the "phylomemetics" really hangs together.

Posted in Science, Identity, Federated Identity

Is quantum computing all it's cracked up to be?

Quantum computing continues to make strides. Now they've made a chip to execute Shor's quantum foctorisation algorithm. Until now, quantum computers were built from bench-loads of apparatus, and had yet to be fabricated in solid state. So this is pretty cool, taking QC from science into engineering.

The promise of quantum computing is that it will eventually render today's core cryptography obsolete, by making it possible to factorise large numbers very quickly. The RSA algorithm for now is effectively unbreakable because its keys are the products of prime numbers hundreds of digits long. The product of two primes can be computed in split seconds; but to find the factors by brute force - and thus crack the code - takes billions of computer-years.

I'm curious about one thing. Current prototype quantum computers are built with just a few qubits because of the 'coherence' problem (so they can only factorise little numbers like 15 = 3 x 5). The machinery has to hold all the qubits in a state of quantum uncertainty for long enough to complete the computation. The more qubits they have, the harder it is to maintain coherence. The task ahead is to scale up past the Proof-of-Concept stage to a manage a few hundred qubits and thus be able to crack 2048 bit RSA keys for instance.

Evidently it's hard to build say a 1000 qubit quantum computer right now. So my question is: What is the relationship between the difficulty of maintaining coherence and the number of qubits concerned? Is it exponentially difficult?

Because if it is, then the way to stay ahead of quantum computing attack might be to simply go out to RSA keys tens of thousands of digits long.

Posted in Security, Science

Science is more than the books it produces

These days it’s common to hear the modest disclaimer that there are some questions science can’t answer. I most recently came across such a show of humility by Dr John Kirk speaking on ABC Radio National’s Ockham’s Razor [1]. Kirk says that “science cannot adjudicate between theism and atheism” and insists that science cannot bridge the divide between physics and metaphysics. Yet surely the long history of science shows that divide is not hard and fast.

Science is not merely about the particular answers; it’s about the steady campaign on all that is knowable.

Science demystifies. Way before having all the detailed answers, each fresh scientific wave works to banish the mysterious, that which previously lay beyond human comprehension.

Textbook examples are legion where new sciences have rendered previously fearsome phenomena as firstly explicable and then often manageable: astronomy, physiology, meteorology, sedimentology, seismology, microbiology, psychology and neurology, to name a few.

It's sometimes said that in science, the questions matter more than the answers. Good scientists ask good questions, but great ones show where there is no question anymore.

Once something profound is no longer beyond understanding, that awareness permeates society. Each wave of scientific advance is usually signaled by new technologies, but more vital to the human condition is that science gives us confidence. In an enlightened society, those with no scientific training at all still appreciate that science gets how the world works. Over time this tacit rational confidence has energised modernity, supplanting astrologers, shamans, witch doctors, and even the churches. Laypeople may not know how televisions work, nor nuclear medicine, semiconductors, anaesthetics, antibiotics or fibre optics, but they sure know it’s not by magic.

The arc of science parts mystery’s curtain. Contrary to John Kirk's partitions, science frequently renders the metaphysical as natural and empirically knowable. My favorite example: To the pre-Copernican mind, the Sun was perfect and ethereal, but when Galileo trained his new telescope upon it, he saw spots. These imperfections were shocking enough, but the real paradigm shift came when Galileo observed the sunspots to move across the face, disappear and then return hours later on the other limb. Thus the Sun was shown―in what must have truly been a heart-stopping epiphany―to be a sphere turning on its axis: geometric, humble, altogether of this world, and very reasonably the centre of a solar system as Copernicus had reasoned a few decades earlier. This was science exercising its most profound power, titrating the metaphysical.

An even more dramatic turn was Darwin's discovery that all the world’s living complexity was explicable without god. He thus dispelled teleology (the search for ultimate reason). He not only neutralised the Argument from Design for the existence of god, but also the very need for god. The deepest lesson of Darwinism is that there is simply no need to ask "What am I doing here?" because the wondrous complexity of all of biology, including humanity's own existence are seen to have arisen through natural selection, without a designer, and moreover, without a reason. Darwin himself felt keenly the gravity of this outcome and what it would mean to his deeply religious wife, and for that reason he kept his work secret for so long. It seems philosophers appreciate the deep lessons of Darwinism more than our modest scientists: Karl Marx saw that evolution “deals the death-blow to teleology” and Frederich Nietzsche claimed “God is dead ... we have killed him”.

So why shouldn’t we expect science to continue? Why should we doubt ― or perhaps fear ― its power to remove all mystery? Of course many remaining riddles are very hard indeed, and I know there’s no guarantee science will be able to solve them. But I don't see the logic of rejecting the possibility that it will. Some physicists feel they’re homing in why the physical constants should have their special values. And many cognitive scientists and philosophers of the mind suspect a theory of consciousness is within reach. I’m not saying anyone yet really gets consciousness yet, but surely most would agree that it just doesn’t feel a total enigma anymore.

Science is more than the books it produces. It’s the power to keep writing new ones.

References

[1]. “Why is science such a worry?” Ockham's Razor 18 December 2011 http://www.abc.net.au/radionational/programs/ockhamsrazor/ockham27s-razor-18-december-2011/3725968

Posted in Science, Culture

An improved frame for understanding Digital Identity

I’ve always been uneasy about the term “ecosystem” being coopted when what is really meant is “marketplace”. There’s nothing wrong with marketplaces! They are where we technologists study problems and customer needs, launch our solutions, and jockey for share. The suddenly-orthodox “Identity Ecosystem” as expressed in the NSTIC is an elaborate IT architecture, that defines specific roles for users, identity providers, relying parties and other players. And the proposed system is not yet complete; many commentators have anticipated that NSTIC may necessitate new legislation to allocate liability.

So it’s really not an "ecosystem”. True ecosystems evolve; they are not designed. If NSTIC isn’t even complete, then badging it an “ecosystem” seems more marketing than ecology; it’s an effort to raise NSTIC as a policy initiative above the hurly burly of competitive IT.

My unease about “ecosystem” got me thinking about ecology, and whether genuine ecological thinking could be useful to analyse the digital identity environment. I believe there is potential here for a new and more powerful way to frame the identity problem.

We are surrounded by mature business ecosystems, in which different sectors and communities-of-interest, large and small, have established their own specific arrangements for managing risk. A special part of risk management is the way in which customers, users, members or business partners are identified. There is almost always a formal protocol by which an individual joins one of these communities-of-interest; that is, when they join a company, qualify for a professional qualification, or open a credit account. Some of these registration protocols are set freely by employers, merchants, associations and the like; others have a legislated element, in regulated industries like aviation, healthcare and finance; and in some cases registration is almost trivial, as when you sign up to a blog site. The conventions, rules, professional charters, contracts, laws and regulations that govern how people do business in different contexts are types of memes. They have evolved in different contexts to minimise risk, and have literally been passed on from one generation to another.

As business environments change, risk management rules in response change too. And so registration processes are subject to natural selection. An ecological treatment of identity recognises that “selection pressures” act on those rules. For instance, to deal with increasing money laundering and terrorist financing, many prudential regulators have tightened the requirements for account opening. To deal with ID theft and account takeover, banks have augmented their account numbers with Two Factor Authentication. The US government’s PIV-I rules for employees and contractors were a response to Homeland Security Presidential Directive HSPD-12. Cell phone operators and airlines likewise now require extra proof of ID. Medical malpractice in various places has led hospitals to tighten their background checks on new staff.

It's natural and valuable, up to a point, to describe "identities" being provided to people acting in these different contexts. This abstraction is central to the Laws of Identity. Unfortunately the word identity is suggestive of a sort of magic property that can be taken out of one context and used in another. So despite the careful framing of the Laws of Identity, it seems that people still carry around a somewhat utopian idea of digital identity, and a tacit belief in the possibility of a universal digital passport (or at least a greatly reduced number of personal IDs). I have argued elsewhere that the passport is actually implausible. If I am right about that, then what is sorely needed next is a better frame for understanding digital identity, a perspective that helps people stay away of the dangerous temptation of a single passport, and to understand that a plurality of identities is the natural state of being.

All modern identity thinking recognises that digital identities are context dependent. Yet it seems that federated identity projects have repeatedly underestimated the strength of that dependence. The federated identity movement is based on an optimism that we can change context for the IDs we have now, and still preserve some recognisable and reusable core identity. Or alternatively, create a new smaller set of IDs that will be useful for transacting with a superset of services. Such “interoperability” has only been demonstrated to date in near-trivial use cases like logging onto blog sites with unverified OpenIDs, or Facebook or Twitter handles. More sophisticated re-use of identities across context – such as the Australian banking sector’s ill-fated Trust Centre project – have foundered, even when there is pre-existing common ground in identification protocols.

The greatest challenge in federated identity is getting service and identity providers that are accustomed to operating in their own silos, to accept risks arising from identification and/or authentication performed on/by their members in other silos. This is where the term “identity” can be distracting. It is important to remember that the “identities” issued by banks, government agencies, universities, cell phone companies, merchants, social networks and blog sites are really proxies for the arrangements to which members have signed up.

This is why identity is so very context dependent, and so why some identities are so tricky to federate.

If we think ecologically, then a better word for the “context” that an identity operates in may be niche. This term properly evokes the tight evolved fit between an identity and the setting in which it is meaningful. In most cases, if we want to understand the natural history of identities, we can look to the existing business ecosystem from where they came. The environmental conditions that shaped the particular identities issued by banks, credit card companies, employers, governments, professional bodies etc. are not fundamentally changed by the Internet. As such, we should expect that when these identities transition from real world to digital, their properties – especially their “interoperability” and liability arrangements -- cannot change a great deal. It is only the pure cyber identities like blogger names, OSN handles and gaming avatars that are highly malleable, because their environmental niches are not so specific.

As an aside, noting how they spread far and wide, and too quickly for us to predict the impacts, maybe it's accurate to liken OpenID and Facebook Connect to weeds!

A lot more work needs to be done on this ecological frame, but I’m thinking we need a new name, distinct from "federated" identity. It seems to me that most business, whether it be online or off, turns on Evolved Identities. If we appreciate the natural history of identities as having evolved, then we should have more success with digital identity. It will become clearer which identities can federate easily, and which cannot, because of their roots in real world business ecosystems.

Taking a digital identity (like a cell phone account) out of its natural niche and hoping it will interoperate in another niche (like banking) can be compared to taking a tropical salt water fish and dropping it into a fresh water tank. If NSTIC is an ecosystem, it is artificial. As such it may be as fragile as an exotic botanic garden or tropical aquarium. I fear that full blown federated identity systems are going to need constant care and intervention to save them from collapse.

Posted in Science, Language, Identity, Federated Identity, Culture, Security

What's the point of astronomy?

Astronomy is the archetypal 'royal science'. It can seem pointless, especially when the projects like the Hubble Space Telescope cost a billion dollars or more. Some justify astronomy on the grounds of the spinoffs, like better radio antenna technology. Some say our destiny is to emigrate from the planet, so we'd better start somewhere. Others simply assert unapologetically that peering into the heavens is what homo sapiens does, at any cost.

Here's a different rationale. To my mind (as an ex astronomer) the deepest practical value of astronomy is it enables us to perform experiments that are just too grand to ever be done on earth. The limits of earth-bound experiments of course change over time, but during each era, astronomy has furnished answers to questions that elude terrestrial investigation.

For example, astronomers were the first to:

– measure the diameter of the earth (by observing the angle of the sun in the sky at noon at different latitudes, in ancient Greece and Egypt)

– measure the speed of light (by observing the transit of the moons of Jupiter)

– demonstrate the General Theory of Relativity (by observing the precession of the orbit of Mercury, and measuring during a solar eclipse the bending of star light by the Sun's gravity )

– detect gravity waves (by observing the slowing of a binary millisecond pulsar).

There must be other examples.

And here's another thing. Obviously it was a vital that humankind debunk the early mysticism about the perfection and mystery of the Sun, and the misconception that the Earth is the centre of the universe. It was when we philosophically grew up and left home. The Coppernican revolution was much more than star gazing.

So ... one of my all-time favorite gooose-bump moments in the history of science is Galileo observing sunspots for the first time, by projecting the Sun through his telecope onto a white sheet. Until then, they all thought the Sun was a perfect unchanging disc of light. But the sunspots represented blemishes and nullified that perfection! Yet more importantly, Galileo saw that the spots moved across the face of the disc over a few hours, disappeared and came back later back at the start. Gadzooks! The sun turns out to be a turning ball!

It was a critical moment of unification, evidence contributing to the realisation that all of the things and all of the stuff in the universe are fundamentally the same.

It's not the only time an intellectual watershed -- a radical, instant, disruptive re-framing of our place in the world -- has been delivered by astronomy.

For example -- and it wasn't really so long ago -- in the early 20th century, Edwin Hubble and his peers established that the 'nebulae' were actually galaxies just like the Milky Way, and that the universe therefore had to be tens of millions of times bigger than previously thought. This kind of revelation about our place in the scheme of things only comes from astronomy. And so astronomy is therefore a very practical pursuit after all.

Posted in Science, Culture

We're not ready for genetic engineering

Update September 2012

The recent discovery that junk DNA is not actually junk rather reinforces my long standing thesis, espoused below, that we don't know enough about how genes work to be able to validate genetic engineering artifacts by testing alone. I point out that computer programs are only validated by a mixture of testing, code inspection and theory, all of which is based on knowing how the code works at the instruction level. But we don't have a terribly complete picture of how genes interact. We always knew they were massively parallel, and now it turns out that junk DNA has some sort of role in gene expression across the whole of the genome, raising the combinatorial complexity enormously. This tells me that we have little idea how modifications at one point in the genome can impact the functioning at any number of other points (but it hints at an explanation as to why human beings are so much more complex than nematodes despite having only a relatively small number of additional raw genetic instructions).

And now there is news that a cow in New Zealand, genetically engineered in respect of one allergenic protein, was born with no tail. Now it's too early to be able to blame the GM for this oddity, but equally, the junk DNA finding surely undermines the confidence that any genetic engineer can have in predicting that their changes cannot have had unexpected and really unpredictable side effects.

________________________

Original post, 15 Jan 2011

As a software engineer years ago I developed a deep unease about genetic engineering and genetically modified organisms (GM). The software experience suggests to me that GM products cannot be verifiable given the state of our knowledge about how genes work. I’d like to share my thoughts.

Genetic engineering proponents seem to believe the entire proof of a GM pudding is in the eating. That is, if trials show that GM food is not toxic, then it must be safe, and there isn't anything else to worry about. The lesson I want others to draw from the still new discipline of software engineering is there is more to the verification of correctness in complex programs than tesing the end product.

Recently I’ve come across an Australian government-sponsored FAQ Arguments for and against gene technology (May 2010) that supposedly provides a balanced view of both sides of the GM debate. Yet it sweeps important questions under the rug.

[At one point the paper invites readers to think about whether agriculture is natural. It’s a highly loaded question grounded in the soothing proposition that GM is simply an extension of the age old artificial selection that gave us wheat, Merinos and all those different potatoes. The question glosses over the fact that when genes recombine under normal sexual reproduction, cellular mechanisms constrain where each gene can end up, and most mutations are still-born. GM is not constrained; it jumps levels. It is quite unlike any breeding that has gone before.]

Genes are very frequently compared with computer software, for good reason. I urge that the comparison be examined more closely, so that lessons can be drawn from the long standing “Software Crisis”.

Each gene codes for a specific protein. That much we know. Less clear is how relatively few genes -- 20,000 for a nematode; 25,000 for a human being -- can specify an entire complex organism. Science is a long way from properly understanding how genes specify bodies, but it is clear that each genome is an immensely intricate ensemble of interconnected biochemical short stories. We know that genes interact with each other, turning each other on and off, and more subtly influencing how each is expressed. In software parlance, genetic codes are executed in a massively parallel manner. This combinatorial complexity is probably why I can share fully half of my genes with a turnip, and have an “executable file” in DNA that is only 20% longer than that of a worm, and yet I can be so incredibly different from those organisms.

If genomes are like programs then let’s remember they have been written achingly slowly over eons, to suit the circumstances of a species. Genomes are revised in a real world laboratory over billions of iterations and test cases, to a level of confidence that software engineers can’t even dream of. Brassica napus.exe (i.e. canola) is at v1000000000.1. Tinkering with isolated parts of this machinery, as if it were merely some sort of wiki with articles open to anyone to edit, could have consequences we are utterly unable to predict.

In software engineering, it is received wisdom that most bugs result from imprudent changes made to existing programs. Furthermore, editing one part of a program can have unpredictable and unbounded impacts on any other part of the code. Above all else, all but the very simplest software in practice is untestable. So mission critical software (like the implantable defibrillator code I used to work on) is always verified by a combination of methods, including unit testing, system testing, design review and painstaking code inspection. Because most problems come from human error, software excellence demands formal design and development processes, and high level programming languages, to preclude subtle errors that no amount of testing could ever hope to find.
How many of these software quality mechanisms are available to genetic engineers? Code inspection is moot when we don’t even know how genes normally interact with one another; how can we possibly tell by inspection if an artificial gene will interfere with the “legacy” code?

What about the engineering process? It seems to me that GM is akin to assembly programming circa 1960s. The state-of-the-art in genetic engineering is nowhere near even Fortran, let alone modern object oriented languages.

Can today’s genetic engineers demonstrate a rigorous verification regime, given the reality that complex software programs are inherently untestable?

We should pay much closer attention to the genes-as-software analogy. Some fear GM products because they are unnatural; others because they are dominated by big business and a mad rush to market. I simply say let’s slow down until we’re sure we know what we're doing.

Posted in Software engineering, Science

There is no algorithm for good management

There is a malaise in security. One problem is that as a "profession", we’ve tried to mechanise security management, as if it were just like generic manufacturing, ammenable to ISO 9000-like management standards. We use essentially the same process and policy templates for all businesses. Don’t get me wrong: process is important, and we do want our security responses to be repeatable and uniform. But not robotic. The truth is, there is no algorithm for doing the right thing. Moreover, there can never be a universal management algorithm, and an underlying naive faith in such a thing is dulling our collective management skills.

An algorithm is a repeatable set of instructions or recipe that can be followed to automatically perform some task or solve some structured problem. Given the same conditions and the same inputs, an algorithm will always produce the same results. But no algorithm can cope with unexpected inputs or events; an algorithm’s designer needs to have a complete view of all input circumstances in advance.

Mathematicians have long known that some surprisingly simple tasks cannot be done algorithmically. The classic ‘travelling salesman’ problem, of how to plot the shortest course through multiple connected towns, has no single recipe for success. There is no way to trisect an angle using a compass and a ruler. There is no consistent way to tell if any given computer program is ever going to stop.

So when security is concerned so much of the time with the unexpected, we should be doubly careful about formulaic management approaches, especially template policies and checklist-based security audits!

Ok, but what's the alternative? This is extremely challenging, but we need to think outside the check box.

Like any complex management field, security is all about problem solving. There’s never going to be a formula for it. Rather, we need to put smart people on the job and let them get on with it, using their experience and their wits. Good security like good design frankly involves a bit of magic. We can foster security excellence through genuine expertise, teamwork, research, innovation and agility. We need security leaders who have the courage to treat new threats and incidents on their merits, trust their professional instincts, try new things, break the mould, and have the sense to avoid management fads.

I have to say I remain pessimistic. These are not good times for couragous managers. For the first rule of career risk management is to make sure everyone agrees in advance to whatever you plan to do, so the blame can be shared when something goes wrong. This is probably the real reason why people are drawn to algorithms in management: they can be documented, reviewed, signed off, and put on back the shelf in wait for a disaster and the inevitable audit. So long as everyone did what the said they were going to do in response to an incident, nobody is to blame.

So I'd like to see a law suit against a company with a perfect ISO 27001 record which still got breached, where the lawyers's case is that it is unreasonable to rely on algorithms to manage in the real world.

Posted in Security, Science, Management theory

Generic verisimilitude

"Generic verisimilitude" is a nice big word! It means the accepted visual language that conveys reality in genre movies. In other words, cinematic cliches.

I've always been bemused by the sideways figure-of-eight black frame that tells us when a movie character is looking through binoculars. Have movie makers ever actually used binoculars? You don't get the sideways "8", instead you just see a nice black circle. But it's not the worst example.

I saw "Rachel getting married" a year or two ago, and thought it was pretty good except for the madly excessive handycam wobble. And I got thinking about that and realised what a terrible artifice it is. Ironically, handicam wobble has become the leading sign of generic verisimilitude in 'gritty' moviemaking, yet the wobble is entirely fictional.

One of the marvels of the human brain is the way it produces a steady image as we move around. We can walk, run, jump up and down even on a trampoline, and our steadfast perception of the world is that it stands still. This complicated feat of cognition is thought to involve feedback mechanisms that allow the brain to compensate for the visual field shifting around on our retinas as the skull moves, sorting out which movements are apparent because we're moving, and which movements are really out there. It's a really vital survival tool; you couldn't chase down a gazelle on the savanna if your cognition was confused by your own mad dashing about.

So, if the world doesn't actually look to me like it shifts when I move, then what is the point of a film maker foisting this jerkiness upon us? If I was really in the place of the cinematographer, no matter how much I dance about, I wouldn't see any wobble.

Moreover, motion pictures are the most voyeuristic artform. The whole cinemagraphic conceit is that you couldn't possibly be in the same room as the people you're privileged to be spying on. So again, why the "realism" of the handicam wobble which is intended to make us feel like actually we're part of the action?

It's odd that in the face of suspension-of-disbelief, when the audience is already putty in their hands, filmmakers inject these falsehoods into the visual language of otherwise hyper-realistic movies.

UPDATED 10 Sep 2012

Another example. I was watching a mockumentary on TV, set in the present, featuring gen Yers, and the protagonists made a home movie. And when we see their movie, it is sepia-coloured and has vertical scratch lines. Now, when was the last time anyone used film and not digital video to make a home movie? I wonder what young people even make of this tricked-up home movie look?

Another example. NASA posts mosaic pictures from the Mars rover - like this one http://twitpic.com/at9ps7 - with the patchwork edges preserved, and where the colour matching is worse than what you can get with free panaorama software on a mobile phone these days. With all their image processing powers, why wouldn't NASA smooth out the component pics? Are they inviting us to imagine standing on Mars like a tourist with our own point-and-click camera?

Posted in Science, Popular culture