I'm an ex software "engineer" [I have reservations about that term] with some life experience of ultra high rel development practices. It's fascinating how much about software quality I learned in the 1980s and 90s is relevant to info sec today.
I've had a trip down memory lane triggered by Karen Sandler's presentation at LinuxConf12 in Ballarat http://t.co/xvUkkaGl and her paper "Killed by code".
The software in implantable defibrillators
I'm still working my way through Karen Sandler's materials. So this post is a work in progress.
What's really stark on first viewing of Karen's talk is the culture she experienced and how it differs from the implantable defib industry I knew in its beginnings 25 years ago.
Karen had an incredibly hard and very off-putting time getting the company that made her defib to explain their software. But when we started in this field, every single person in the company -- and many of our doctors -- would have been able to answer the question What software does this defib run on?: the answer was "ours". And moreover, the FDA were highly aware of software quality issues. The whole medical device industry was still on edge from the notorious Therac 25 episode, a watershed in software verification.
A personal story
I was part of the team that wrote the code for the world's first software controlled implantable cadrioverter/defibrillator (ICD).
In 1990 Telectronics (a tragic legend of Australian technology) released the model 4210, which was probably third of all ICDs onto the market, the first couple being hard wired devces from CPI Inc. The computing technology was severely restricted by several design factors, most especially ultra low power consumption, and a very limited number of microprocessor vendors that would warrant their chips for use in medical devices. The 4210 defib used a semi-customised 8 bit microcontroller based on the 6502, and a 32 KB byte-organised SRAM chip that held the entire executable. The micro clocked at 128kHz, fully eight times slower than the near identical micro in the Apple II a decade earlier. The software had to be efficient, not only to ensure it could make some very tough real time rendezvous, but to keep the power consumption down; the micro consumed about 30% of the device's power over its nominal five year lifetime.
We wrote mostly in C, with some assembly coding for the kernel and some performance sensitive routines. The kernel was of our own design, multi-tasking, with hard real time performance requirements (in particular, for obvious reasons the system had to respond within tight specs to heart beat interrupts and we had to show we weren't ever going to miss an interrupt!) We also wrote the C compiler.
The 4210's software was 40,000 lines of C, developed by a team of 5-6 over several years; the total effort was 25 person-years. Some of the testing and pre-release validation is described in my blog post about coding being like play writing. The final code inspection involved a team of five working five-to-six hour days for two months, reading aloud and understanding every single line. When occasion called for checking assembly instructions, sometimes we took turns with pencil and paper pretending to be the accumulators, the index registers, the program counter and so on. No stone was left unturned.
The final walk-through was quite a personnel management challenge. One of the senior engineers (a genius who also wrote our kernel and compiler) lobbied for inspecting the whole executable because he didn't want to rely on the correctness of the compiler -- but that would have taken six months. So we compromised by walking through only the assembly code for the critical modules, like the tachycardia detector and the interrupt handlers.
I mentioned that the kernel and compiler were home-grown. So this meant that the company controlled literally every single bit of code running in its defibs. And I reiterate we had several individuals who knew the source code end to end.
By the way, these days I will come across developers in the smartcard industry who find it hard working on applets that are 5 or 10 kilobytes small. Compare say a digital signing applet with an ICD, with its cardiac monitoring algorithms, treatment algorithms, telemetry controller, data logging and operating system all squeezed into 32KB.
We amassed several thousand implant-years of experience with the 4210 before it was superseded. After release, we found two or three minor bugs, which we fixed with software upgrades. None would have caused a misfire, neither false positive or false negative.
Yes, for the upgrade we could write into the RAM over a proprietary telemetry protocol. In fact the main reason for the one major software upgrade in the field was to add error correction because after hundreds of device-years we noticed higher than expected bit flips from natural background radiation. That's a helluva story in itself. It was observed that had the code been in ROM, we couldn't have changed it but we wouldn't have had to change it for bit flips either.
Morals of the story
Anyway, some of the morals of the story so far:
- Software then was cool and topical, and the whole company knew how to talk about it. The real experts -- the dozen or so people in Sydney directly involved in the development -- were all well known worldwide by the executives, the sales reps, the field clinical engineers, and regulatory affairs. And we got lots of questions (in contrast to Karen Sandler's experience where all the caridologists and company people said nobody ever asked about the code).
- Everything about the software was controlled by the company: the operating system, the chip platform, the compiler, the telemetry protocol.
- We had a team of people that knew the code like the backs of their hands. Better in fact. It was reliable and, in hindsight, impregnable. Not that we worried about malware back in 1987-1990.
Where has software development gone?
So the sorts of issues that Karen Sandler is raising now, over two decades on, are astonishing to me on so many levels.
- Why would anyone decide to write life support software on someone else's platform?
- Why would they use wifi or Bluetooth for telemetry?
- And if the medical device companies cut corners in software development, one wonders what the defense industry is doing with their drone flight controllers and other "smart" weaponry with its countless millions of lines of opaque software.
A colleague drew my attention to what he called "yet another management standard". Which got me thinking about where our preoccupation with standards might be heading and where it might end.
Most modern risk management standards allow for exception management. If a company has a formal procedure in place -- for example a Disaster Recovery Plan -- but something out of the ordinary comes up, then the latest standards provide management with flexibility to vary their response to suit their particular circumstances; in other words, management can generally waive regular procedures and "accept the risk". The company can remain in compliance with management systems and standards if it documents these exceptions carefully.
So ... what if a company says "the hell with this latest management standard, we don't want to have anything to do with it". If the standard allows for exceptions, then the company may still be in compliance with the standard by not being in compliance with it.
How about that: a standard you cannot help but comply with!
And then we wouldn't need auditors. We might even start to make some real progress.
Here's a less facetious analysis of the perils of over-standardisation: http://lockstep.com.au/blog/2010/12/21/no-algorithm-for-management.
Posted in Management theory
In cyber security, user awareness, education and training have long gone past their Use By Date. We have technological problems that need technological fixes, yet governments and businesses remain averse to investing in real security. Instead, the long standing management fad is to 'audit ourselves' out of trouble, and to over-play user awareness as a security measure when the systems we make them use are inherently insecure.
It’s a massive systemic failure in the security profession.
We see a policy and compliance fixation everywhere. The dominant philosophy in security is obsessed with process. The international information security standard ISO 27001 is a management system standard; it has almost nothing to say universally about security technology. Instead the focus is on documentation and audit. Box ticking. It’s intellectually a carbon-copy of the ISO 9001 quality management standard, and we all know the limitations of that.
Or do we? Remember that those who don’t know the lessons of history are condemned to repeat it. I urge all infosec practitioners to read this decade old article: Is ISO 9000 really a standard? -- it should ring some bells.
Education, policy and process are almost totally useless in fighting ID theft. Consider this: those CD ROMs with 25,000,000 financial records, lost in the mail by British civil servants in 2007 were valued at 1.5 billion pounds, using the going rate on the stolen identity black market. With stolen data being so immensely valuable, just how is security policy ever going to stop insiders cashing in on such treasure?
In another case, after data was lost by the Australian Tax Office, there was earnest criticism that the data should have been encrypted. But so what if it was? What common encryption method could not be cracked by organised crime if there was millions and millions of dollars worth of value to be gained?
The best example of process and policy-dominated security is probably the Payment Card Industry Data Security Standard PCI-DSS. The effectiveness of PCI-DSS and its onerous compliance regime was considered by a US Homeland Security Congressional Committee in March 2009. In hearings, the National Retail Federation submitted that “PCI has been plagued by poor execution ... The PCI guidelines are onerous, confusing, and are constantly changing”. They noted the irony that “the credit card companies’ rules require merchants to store credit card data that many retailers do not want to keep” (emphasis in original). The chair committee remarked that “The essential flaw with the PCI Standard is that it allows companies to check boxes, but not necessarily be secure. Compliance does not equal security. We have to get beyond check box security.”
To really stop ID theft, we need proper technological preventative measures, not more policies and feel-good audits.
The near exclusive emphasis on user education and awareness is a subtle form of blame shifting. It is simply beyond the capacity of regular users to tell pharming sites from real sites, or even to spot all phishing e-mails. What about the feasibility of training people to "shop safely" online? It's a flimsy proposition, considering that the biggest cases of credit card theft have occurred at backend databases of department store chains and payments processors. Most stolen card details in circulation probably originate from regular in-store Card Present transactions, and not from Internet sites. The lesson is even if you never ever shop online, you can have your card details stolen and abused behind your back. All the breathless advice about looking out for the padlock is moot.
In other walks of life we don’t put all the onus on user education. Think about car safety. Yes good driving practices are important, but the major focus is on legislated standards for automotive technology, and enforceable road rules. In contrast, Internet security is dominated by a wild west, everyone-for-themselves mentality, leading to a confusing patchwork of security gizmos, proprietary standards and no common benchmarks.
There is a malaise in security. One problem is that as a "profession", we’ve tried to mechanise security management, as if it were just like generic manufacturing, ammenable to ISO 9000-like management standards. We use essentially the same process and policy templates for all businesses. Don’t get me wrong: process is important, and we do want our security responses to be repeatable and uniform. But not robotic. The truth is, there is no algorithm for doing the right thing. Moreover, there can never be a universal management algorithm, and an underlying naive faith in such a thing is dulling our collective management skills.
An algorithm is a repeatable set of instructions or recipe that can be followed to automatically perform some task or solve some structured problem. Given the same conditions and the same inputs, an algorithm will always produce the same results. But no algorithm can cope with unexpected inputs or events; an algorithm’s designer needs to have a complete view of all input circumstances in advance.
Mathematicians have long known that some surprisingly simple tasks cannot be done algorithmically. The classic ‘travelling salesman’ problem, of how to plot the shortest course through multiple connected towns, has no single recipe for success. There is no way to trisect an angle using a compass and a ruler. There is no consistent way to tell if any given computer program is ever going to stop.
So when security is concerned so much of the time with the unexpected, we should be doubly careful about formulaic management approaches, especially template policies and checklist-based security audits!
Ok, but what's the alternative? This is extremely challenging, but we need to think outside the check box.
Like any complex management field, security is all about problem solving. There’s never going to be a formula for it. Rather, we need to put smart people on the job and let them get on with it, using their experience and their wits. Good security like good design frankly involves a bit of magic. We can foster security excellence through genuine expertise, teamwork, research, innovation and agility. We need security leaders who have the courage to treat new threats and incidents on their merits, trust their professional instincts, try new things, break the mould, and have the sense to avoid management fads.
I have to say I remain pessimistic. These are not good times for couragous managers. For the first rule of career risk management is to make sure everyone agrees in advance to whatever you plan to do, so the blame can be shared when something goes wrong. This is probably the real reason why people are drawn to algorithms in management: they can be documented, reviewed, signed off, and put on back the shelf in wait for a disaster and the inevitable audit. So long as everyone did what the said they were going to do in response to an incident, nobody is to blame.
So I'd like to see a law suit against a company with a perfect ISO 27001 record which still got breached, where the lawyers's case is that it is unreasonable to rely on algorithms to manage in the real world.