Update September 2012
The recent discovery that junk DNA is not actually junk rather reinforces my long standing thesis, espoused below, that we don’t know enough about how genes work to be able to validate genetic engineering artifacts by empirical testing alone. I point out that computer programs are only validated by a coordinated mixture of testing, code inspection and theory, all of which based on knowing how the code works at the instruction level. But we don’t have a terribly complete picture of how genes interact. We always knew they were ‘massively parallel’, and now it turns out that junk DNA has some sort of role in gene expression across the whole of the genome, raising the combinatorial complexity even further. This tells me we have little idea how modifications at one point in the genome can impact the functioning at any number of other points (but it hints at an explanation as to why human beings are so much more complex than nematodes despite having only a modestly larger genome).
And now there is news that a cow in New Zealand, genetically engineered in respect of one allergenic protein, was born with no tail. It’s too early to blame the GM for this oddity, but equally, the junk DNA finding surely undermines the confidence that any genetic engineer can have in predicting that their changes cannot have had unexpected and really unpredictable side effects.
Original post, 15 Jan 2011
As a software engineer years ago I developed a deep unease about genetic engineering and genetically modified organisms (GM). The software experience suggests to me that GM products cannot be verifiable given the state of our knowledge about how genes work. I’d like to share my thoughts.
Genetic engineering proponents seem to believe the entire proof of a GM pudding is in the eating. That is, if trials show that GM food is not toxic, then it must be safe, and there isn’t anything else to worry about. The lesson I want others to draw from the still new discipline of software engineering is there is more to the verification of correctness in complex programs than tesing the end product.
Recently I’ve come across an Australian government-sponsored FAQ Arguments for and against gene technology (May 2010) that supposedly provides a balanced view of both sides of the GM debate. Yet it sweeps important questions under the rug.
Genes are very frequently compared with computer software, for good reason. I urge that the comparison be examined more closely, so that lessons can be drawn from the long standing “Software Crisis”.
Each gene codes for a specific protein. That much we know. Less clear is how relatively few genes — 20,000 for a nematode; 25,000 for a human being — can specify an entire complex organism. Science is a long way from properly understanding how genes specify bodies, but it is clear that each genome is an immensely intricate ensemble of interconnected biochemical short stories. We know that genes interact with each other, turning each other on and off, and more subtly influencing how each is expressed. In software parlance, genetic codes are executed in a massively parallel manner. This combinatorial complexity is probably why I can share fully half of my genes with a turnip, and have an “executable file” in DNA that is only 20% longer than that of a worm, and yet I can be so incredibly different from those organisms.
If genomes are like programs then let’s remember they have been written achingly slowly over eons, to suit the circumstances of a species. Genomes are revised in a real world laboratory over billions of iterations and test cases, to a level of confidence that software engineers can’t even dream of. Brassica napus.exe (i.e. canola) is at v1000000000.1. Tinkering with isolated parts of this machinery, as if it were merely some sort of wiki with articles open to anyone to edit, could have consequences we are utterly unable to predict.
In software engineering, it is received wisdom that most bugs result from imprudent changes made to existing programs. Furthermore, editing one part of a program can have unpredictable and unbounded impacts on any other part of the code. Above all else, all but the very simplest software in practice is untestable. So mission critical software (like the implantable defibrillator code I used to work on) is always verified by a combination of methods, including unit testing, system testing, design review and painstaking code inspection. Because most problems come from human error, software excellence demands formal design and development processes, and high level programming languages, to preclude subtle errors that no amount of testing could ever hope to find.
How many of these software quality mechanisms are available to genetic engineers? Code inspection is moot when we don’t even know how genes normally interact with one another; how can we possibly tell by inspection if an artificial gene will interfere with the “legacy” code?
What about the engineering process? It seems to me that GM is akin to assembly programming circa 1960s. The state-of-the-art in genetic engineering is nowhere near even Fortran, let alone modern object oriented languages.
Can today’s genetic engineers demonstrate a rigorous verification regime, given the reality that complex software programs are inherently untestable?
We should pay much closer attention to the genes-as-software analogy. Some fear GM products because they are unnatural; others because they are dominated by big business and a mad rush to market. I simply say let’s slow down until we’re sure we know what we’re doing.