A software engineer's memoir (work in progress)
I'm an ex software "engineer" [I have reservations about that term] with some life experience of ultra high rel development practices. It's fascinating how much about software quality I learned in the 1980s and 90s is relevant to info sec today.
I've had a trip down memory lane triggered by Karen Sandler's presentation at LinuxConf12 in Ballarat http://t.co/xvUkkaGl and her paper "Killed by code".
The software in implantable defibrillators
I'm still working my way through Sandler's materials. So this post is a work in progress.
What's really stark on first viewing of Karen's talk is the culture she experienced and how it differs from the implantable defib industry I knew in its beginnings 25 years ago.
Sandler had an incredibly hard and very off-putting time getting any defib company to explain their software. But when we started in this field, every single person in the company -- and many of our doctors -- would have been able to answer the question "What software does this defib run on?" The answer was "ours". And moreover, the FDA were highly aware of software quality issues. The whole medical device industry was still on edge from the notorious Therac 25 episode [ADD REF], a watershed in software verification.
A personal story
I was part of the team that wrote the code for the world's first implantable defibrillator.
In 1990 Telectronics (a tragic legend of Australian technology) released the world's first software controlled implantable defib, the model 4210. The computing technology was severely restricted by several factors: ultra low power consumption, and a limited number of microprocessor vendors that would warrant their chips for use in medical devices. The 4210 defib used a semi-customised 8 bit microcontroller based on the 6502, and a 32 KB byte-organised SRAM chip that held the entire executable. It clocked at 128kHz. The software had to be efficient, not only to ensure it could make some very tough real time rendezvous, but to keep the power consumption down; the micro consumed about 30% of the device's power over its nominal five year lifetime.
We wrote mostly in C, with some assembly coding for the kernel and some performance sensitive routines. The kernel was of our own design, multi-tasking, with hard real time performance requirements (in particular, for obvious reasons the system had to respond within tight specs to heart beat interrupts and we had to show we weren't ever going to miss an interrupt!) We also wrote the C compiler.
The 4210's software was 40,000 lines of C, developed by a team of 5-6 over several years; the total effort was 25 person-years. Some of the testing and pre-release validation is described in my blog post http://lockstep.com.au/blog/2011/02/23/programming-is-like-playwriti.html. The final code inspection involved a team of five working five-to-six hour days for two months, reading aloud and understanding every single line. When occasion called for checking assembly instructions, sometimes we took turns with pencil and paper pretending to be the accumulators, the index registers, the program counter and so on. No stone was left unturned.
The final walkthrough was quite a personnel challenge. One of the senior engineers (a genius who wrote the kernel and compiler) lobbied for inspecting the whole executable because he didn't want to rely on the correctness of the compiler -- but that would have taken six months. So we compromised by walking through only the assembly code for the critical modules, like the tachycardia detector and the interrupt handlers.
We amassed several thousand implant-years of experience with the 4210 before it was superceded. After release, we found two or three minor bugs, which we fixed with software upgrades. None would have caused a misfire, neither false positive or false negative.
Yes, for the upgrade we could write into the RAM over a proprietary telemetry protocol. In fact the main reason for the one major software upgrade in the field was to add error correction because after hundreds of device-years we noticed higher than expected bit flips from natural background radiation. That's a helluva story in itself. It was observed that had the code been in ROM, we couldn't have changed it but we wouldn't have had to change it for bit flips either.
Morals of the story
Anyway, some of the morals of the story so far:
- Software then was cool and topical, and the whole company knew how to talk about it. The real experts -- the dozen or so people in Sydney directly involved in the development -- were all well known worldwide by the executives, the sales reps, the field clinical engineers, and regulatory affairs. And we got lots of questions (in contrast to Karen Sandler's experience where all the caridologists and company people said nobody ever asked about the code).
- Everything about the software was controlled by the company: the operating system, the chip platform, the compiler, the telemetry protocol.
- We had a team of people that knew the code like the backs of their hands. Better in fact. It was reliable and, in hingsight, impregnable. Not that we worried about malware back in 1987-1990.
Where has software development gone?
So the sorts of issues that Karen Sandler is raising now, over two decades on, are astonishing on so many levels.
Why would anyone decide to write life support software on someone else's platform?
Why would they use wifi or Bluetooth for telemetry?
And if the medical device companies cut corners in software develeopment, one wonders what Westinghouse is doing with their cruise missile controllers.
[TO BE CONTINUED]Posted in Software engineering, Management theory
Programming is like playwriting
The software-as-a-profession debate continues largely untouched by each generation's innovations in production methods. Most of us in the 90s thought that formal methods, reuse and enforced modularity would introduce to software some of the hallmarks of real engineering: predictability, repeatability, measurability and quality. Yet despite Objected Oriented methods and sophisticated CASE tools, many of the human traits of software-as-a-craft remain with us.
The "software crisis" – the systemic inability to estimate software projects accurately, to deliver what's promised, and to meet quality expectations – is over 40 years old. Its causes are multiple and subtle, and despite ample innovation in languages, tools and methodologies, debates continue over what ails the software industry. The latest skirmish is a provocative suggestion from Forrester analyst Mark Gualtieri that we shift the emphasis from independent Quality Assurance back onto developers’ own responsibilities, or abandon QA altogether. There has been a strong reaction! The general devotion to QA I think is aided and abetted by today’s widespread fashion for corporate governance.
But we should listen to radical ideas like Gualtieri’s, rather than maintain a slavish devotion to orthodoxies. We should recognise that software engineering is inherently different from conventional engineering, because software itself is a different kind of material. Its properties make it less amenable to regular governance.
Simply, software does not obey the laws of nature. Building skyscrapers, tunnels, dams and bridges is relatively predictable. You start with site surveys and foundations, erect a sturdy framework and all sorts of temporary formers, flesh out the structure, fill in the services like power and plumbing, do the fit-out, and finally take away all the scaffolding.
Specifications for real engineering projects don’t change much, even over several years for really big initiatives. And the engineering tools don't change at all.
Software is utterly unlike this. You can start writing software anywhere you like, and before the spec is signed off. There aren’t any raw materials to specify and buy, and no quantity surveyors to deal with. Metaphorically speaking, the plumbing can go in before the framework. Hell, you don't even need a framework! Nothing physical holds a software system up. Flimsy software, as a material, is indistinguishable from the best code. No laws of physics dictate that you start at the bottom and work your way slowly upwards, using a symbiosis of material properties and gravity to keep your construction stable and well-behaved as it grows. The process of real world engineering is thick with natural constraints that ensure predictability (just imagine how wobbly a house would be if you could lay the bricks from the top down) whereas software development processes are almost totally arbitrary, except for the odd stricture imposed by high level languages.
Real world systems are thoroughly compartmentalised naturally. If a bearing fails in an air-conditioning plant in the basement, it’s not going to affect the integrity of any of the floor plates. On the other hand, nothing physically decouples lines of code; a bug in one part of the program can impinge on almost any other part (which incidentally renders traditional failure modes and effects analysis impossible). We only modularise software by artificial means, like banning goto statements and self-modifying code.
Coding is fast and furious. In a single day, a programmer can create a system probably more complex than an airport that takes more than 10,000 person-years to build. And software development is tremendous creative fun. Let's be honest: it's why the majority of programmers chose their craft in the first place.
Ironically the rapidity of programming contributes significantly to software project overruns. We only use software in information systems because it's faster to make and easier to modify than wired logic. So the temptation is irresistible to keep specs fluid and to accommodate new requirements at any time. Famously, the differences between prototype, beta and production product are marginal and arbitrary. Management and marketing take advantage of this fact, and unfortunately software engineers themselves yield too readily to the attraction of the last minute tweak.
I suggest programming is more like play writing than engineering, and many programmers (especially the really good ones!) are just as manageable as poets.
In both software and play writing, structure is almost entirely arbitrary. Because neither obey the laws of physics, the structure of software and plays comes from the act of composition. A good software engineer will know their composition from end to end. But another programmer can always come along and edit the work, inserting their own code as they see fit. It is received wisdom in programming that most bugs arise from imprudent changes made to old code.
Messing with a carefully written piece of software is fraught with danger, just as it is with a finished play. I could take Hamlet for instance, and hack it as easily as I might hack an old program -- add a character or two, or a whole new scene -- but the entire internal logic of the play would almost certainly be wrecked. It would be “buggy”.
I was a software development manager for some years in the cardiac pacemaker industry. We developed the world’s first software controlled automatic implantable defibrillator. It had several tens of thousands of lines of C, developed at a rate of about one tested line of code per person per day. We quantified it as the most reliable real time software ever written at the time.
I believe the outstanding quality resulted from a handful of special grassroots techniques:
1. We had independent software test teams that developed their own test cases and tools
2. We did obsessive source code inspections on all units before integration. And in the end, before we shipped, we did an end-to-end walkthrough of the frozen software. It took six of us two months straight. So we had several people who knew the entire object intimately.
3. We did early informal design reviews. As team leader, I favoured having my developers do a whiteboard presentation to the team of their early design ideas, no more than 48 hours after being given responsibility for a module. This helped prevent designers latching onto misconceptions at the formative stages.
4. We took our time. I was concerned that the CASE tools we introduced in the mid 90s might make code rather too easy to trot out, so at the same time I set a new rule that developers had toturn their workstations off for a whole day once a week, and work with pen and paper.
5. My internal coding standard included a requirement that when starting a new module, developers write their comments before they write their code, and their comments had to describe ‘why’ not ‘what’. Code is all syntax; the meaning and intent of any software can only be found in the natural language comments.
Code is so very unlike the stuff of other professions – soil and gravel, metals and alloys, nuts and bolts, electronics, even human flesh and blood - that the metaphor of engineering in the phrase “software engineering” may be dangerously misplaced. By coopting the term we have might have started out on the wrong foot, underestimating the fundamental challenge of forging a software profession. It won't be until software engineering develops the normative tools and standards, culture and patience of a true profession that the software crisis will turn around. And then corporate governance will have something to govern in software development.
Posted in Software engineering, Language, Culture
We're not ready for genetic engineering
As a software engineer years ago I developed a deep unease about genetic engineering and genetically modified organisms (GM). The software experience suggests to me that GM products cannot be verifiable given the state of our knowledge about how genes work. I’d like to share my thoughts.
Personally, I'm not even slightly queasy about eating the stuff; after all, I smoked when I was young and I still drink too much sometimes, so I'm not inclined to whinge non-specifically about poisons entering my body. I'm sure GM foods are not going to hurt me. Yet some GM proponents would have us believe that the entire proof of this pudding is in the eating. That is, if trials show that GM food is not toxic, then it must be safe, and there isn't anything else to worry about. The lesson I want others to draw from the still new discipline of software engineering is that verification of correctness in complex systems is much much harder than tesing the end product.
Recently I’ve come across an Australian government-sponsored FAQ Arguments for and against gene technology (May 2010) that supposedly provides a balanced view of both sides of the GM debate. Yet it sweeps important questions under the rug.
[At one point the paper invites readers to think about whether agriculture is natural. It’s a soothing question grounded in the cool proposition that GM is simply an extension of the age old artificial selection that gave us wheat, Merinos and all those different potatoes. Yet the question glosses over the fact that when genes recombine under normal sexual reproduction, cellular mechanisms constrain where each gene can end up, and most mutations are still-born. GM is not constrained; it jumps levels. It is quite unlike any breeding that has gone before.]Genes are very frequently compared with computer software, for good reason. I urge that the comparison be examined more closely, so that lessons can be drawn from the long standing “Software Crisis”.
Each gene codes for a specific protein. That much we know. Less clear is how relatively few genes -- 20,000 for a nematode; 25,000 for a human being -- can specify an entire complex organism. Science is a long way from properly understanding how genes specify bodies, but it is clear that each genome is an immensely intricate ensemble of interconnected biochemical short stories. We know that genes interact with each other, turning each other on and off, and more subtly influencing how each is expressed. In software parlance, genetic codes are executed in a massively parallel manner. This combinatorial complexity is probably why I can share fully half of my genes with a turnip, and have an “executable file” in DNA that is only 20% longer than that of a worm, and yet I can be so incredibly different from those organisms.
If genomes are like programs then let’s remember they have been written achingly slowly over eons, to suit the circumstances of a species. Genomes are revised in a real world laboratory over billions of iterations and test cases, to a level of confidence that software engineers can’t even dream of. Brassica napus.exe (i.e. canola) is at v1000000000.1. Tinkering with isolated parts of this machinery, as if it were merely some sort of wiki with articles open to anyone to edit, could have consequences we are utterly unable to predict.
In software engineering, it is received wisdom that most bugs result from imprudent changes made to existing programs. Furthermore, editing one part of a program can have unpredictable and unbounded impacts on any other part of the code. Above all else, all but the very simplest software in practice is untestable. So mission critical software (like the implantable defibrillator code I used to work on) is always verified by a combination of methods, including unit testing, system testing, design review and painstaking code inspection. Because most problems come from human error, software excellence demands formal design and development processes, and high level programming languages, to preclude subtle errors that no amount of testing could ever hope to find.
How many of these software quality mechanisms are available to genetic engineers? Code inspection is moot when we don’t even know how genes normally interact with one another; how can we possibly tell by inspection if an artificial gene will interfere with the “legacy” code?
What about the engineering process? It seems to me that GM is akin to assembly programming circa 1960s. The state-of-the-art in genetic engineering is nowhere near even Fortran, let alone modern object oriented languages.
Can today’s genetic engineers demonstrate a rigorous verification regime, given the reality that complex software programs are inherently untestable?
We should pay much closer attention to the genes-as-software analogy. Some fear GM products because they are unnatural; others because they are dominated by big business and a mad rush to market. I simply say let’s slow down until we’re sure we know what we're doing.
Posted in Software engineering, Science