The ongoing debate (or spat) on Twitter about the "No Estimates" movement had me reaching for the archives.
Some now say that being forced to provide estimates is somehow counter-productive for software developers. I've long thought about programming productivity, and the paradox that software is too soft.
Some programmers want special treatment. In effect, "No Estimates" proponents are claiming their particular work is not amenable to traditional metrics and management. Now in a way, they're right; there is as yet no such thing as software "engineering". There are none of the handbooks or standards that feature in chemical, mechanical and electrical engineering. But nevertheless, if a programmer knows what they're doing - if they know their subject matter and how their code behaves - then providing estimates is not all that difficult. Disclaiming one's ability to predict how long a task will take is a weird way to try and engage with the business.
Software is definitely a difficult medium. It's highly non-linear, and breeds amazing complexity. But a great many of today's problems, like the recent #gotofail and Heartbleed scandals, are manifestly due to chaotic development practices.
As such, programmers are part of the problem.
I once wrote a letter to the editor of ComputerWorld about this ...
Yes indeed, IT is made the scapegoat for a great many project disasters (ComputerWorld 28 September, 2005, page 1). But it may prove fruitless to force orthodox project management and corporate governance methodologies onto big IT projects. And at the same time, IT "professionals" are not entirely free of blame.
So the KPMG Global IT Project Management Survey found that the vast majority of technology projects run over budget. In the main, "technology" means software, whether we build or buy. The "software crisis" - the systemic inability to estimate software projects accurately and to deliver what's promised - is about 40 years old. And it's more subtle than KPMG suggests in blaming corporate governance. It is fashionable at the moment to look to governance to rectify business problems but in this case, it really is a technology issue.
Software project management truly is different from all other technical fields, for software does not obey the laws of nature. Building skyscrapers, tunnels, dams and bridges is relatively predictable. You start with site surveys and foundations, erect a sturdy framework, fill in the services, fit it out, and take away the scaffolding. Specifications don't change much over a several year project, and the tools don't change at all.
But with software, you can start a big project anywhere you like, and before the spec is signed off. Metaphorically speaking, the plumbing can go in before the framework. Hell, you don't even need a framework! Nothing physical holds a software system up.
And software coding is fast and furious. In a single day, a programmer can create a system more complex than an airport that might take 10,000 person-years to build. So software development is fun. Let's be honest: it's why the majority of programmers chose their craft in the first place.
Ironically it's the rapidity of programming that contributes the most to project overruns. We only use software in information systems because it's fast to write and easy to modify. So the temptation is irresistible to keep specs fluid and to change requirements at any time. Famously, the differences between prototype, "beta release" and product are marginal and arbitrary. Management and marketing take advantage of this fact, and unfortunately software engineers themselves yield too readily to the attraction of the last minute tweak.
The same dynamics of course afflict third party software components. They tend to change too often and fail to meet expectations, making life hell for IT systems integrators.
It won't be until software engineering develops the tools, standards and culture of a true profession that any of this will change. Then corporate governance will have something to govern in big technology projects. Meanwhile, programmers will remain more like playwrights than engineers, and just as manageable.
Summary: BlackBerry is poised for a fresh and well differentiated play in the Internet of Things, with its combination of handset hardware security, its uniquely rated QNX operating system kernel, and its experience with the FIDO device authentication protocols.
To put it plainly, BlackBerry is not cool.
And neither is security.
But maybe two wrongs can make a right, in terms of a compelling story. BlackBerry's security story has always been strong, it's getting stronger, and it could save them.
Today I attended the BlackBerry Security Summit in New York City (Disclosure: my travel and accommodation were paid by BlackBerry). The event was announced very recently; none of my colleagues had heard of it. So what was the compelling need to put on a security show in New York? It turned out to be the 9:00am announcement that BlackBerry is acquiring the German voice security specialists Secusmart. BlackBerry and Secusmart have worked together for a long time; their stated aim is to put a real secure phone in the "hand of every President and every Chancellor".
Secusmart CEO Hans-Christoph Quelle is a forceful champion of voice security; in this age of evidently routine spying by state and competitors alike, there is enormous demand building for counter-surveillance in telephony and messaging. Secusmart is also responsible for the highly rated Micro SD cards that BlackBerry proudly use as removable security modules in their handsets. And this is where the SecuSmart tie-up really resonates for me. It comes hot on the heels of last week's Cloud Security Summit, where there was so much support for personal Hardware Security Modules (HSMs), be they Micro SD cards, USB keys, NFC Secure Elements, the good old "Trusted Platform Module" (TPM) or any number of proprietary chip sets.
Today's event also showcased BlackBerry's QNX division (acquired in 2010) and its secure operating system. CEO John Chen reckons that the software in 50% of connected cars runs on the QNX OS (and in high reliability settings like power stations, wind turbines and even gaming machines, the penetration is even higher). And so he is positioning BlackBerry as a major player in the Internet of Things.
We heard from QNX founder Dan Dodge about the elegance of their system. At just 100,000 lines of code, Dodge stressed that his team knows the software inside-out. There is not a single line of code in their OS that QNX did not write themselves. In contrast, such mastery is utterly impossible in the 15,000,000 lines that make up Linux or the estimated 50-70 million lines in Windows. It happens that I've recently lamented the parlous state of software quality and the need to return to first principles security. So I am on Dan Dodge's wavelength.
BlackBerry's security people had a little bit to say about identity as well, and apparently more's to come. For now, they are flagging that with 250 million customers in their messaging system, BBM represents "one of the biggest identity systems in the world". And as such the company does plan to "federate" it somehow. They reminded us at the same time of the BlackBerry Cloud slated for launch in December.
Going forward, the importance of strong, physical Two Factor Authentication for accessing the cloud is almost a given now. And the smartphone is fast becoming the predominant access mechanism, so the combination of secure elements, handsets and high security infrastructure is potent.
There's a lot that BlackBerry is keeping close to its chest, but for me one extant piece of the IoT puzzle was conspicuously absent today: the role of the FIDO Alliance protocols. After all, BlackBerry has been a FIDO Board Member for a long time. It seems to me that FIDO's protocols for exchanging verified authentication signals and information about devices should be an important element of BlackBerry's play in both its software infrastructure and its devices.
In closing, I'll revisit the very first thing we heard at today's event. It was a video testimonial, telling us "If you need nuclear security, you need BlackBerry". As I said, security really isn't cool. Jazzing up the company's ability to deliver "nuclear" grade to demanding clients is actually not the right message. Security in the Internet of Things -- and therefore in everyday life -- may turn out to be just as important.
We basically know that nuclear power plants are inherently risky; we know that planes will occasionally fall out of the sky. Paradoxically, the community has a reasonable appetite for risk and failures in very complex systems like those. Individually and/or collectively we have decided we just can't live without electricity and travel and so we've come to settle on a roughly acceptable finite cost in terms of failures. But when the mundanities of life go digital, the tolerance of failure will drop. When our cars and thermostats and light switches are connected to the Internet, and when a bug or a script kiddie's stunt can soon send whole neighbourhoods into a spin, consumers won't stand for it.
So the very best security we can currently engineer is in fact going to be necessary at scale for smart appliances, wearables, connected homes, smart meters and networked cars. We need a different gauge for this type of security, and it's going to be very tough to engineer and deploy economically. But right now, with its deep understanding of dependable OS's and commitment to high quality device hardware, it seems to me BlackBerry has a head-start in the Internet of Things.
For the past year, oncologists at the Memorial Sloan Kettering Cancer Centre in New York have been training IBM’s Watson – the artificial intelligence tour-de-force that beat allcomers on Jeopardy – to help personalise cancer care. The Centre explains that "combining [their] expertise with the analytical speed of IBM Watson, the tool has the potential to transform how doctors provide individualized cancer treatment plans and to help improve patient outcomes". Others are speculating already that Watson could "soon be the best doctor in the world".
I have no doubt that when Watson and things like it are available online to doctors worldwide, we will see overall improvements in healthcare outcomes, especially in parts of the world now under-serviced by medical specialists [having said that, the value of diagnosing cancer in poor developing nations is questionable if they cannot go on to treat it]. As with Google's self-driving car, we will probably get significant gains eventually, averaged across the population, from replacing humans with machines. Yet some of the foibles of computing are not well known and I think they will lead to surprises.
For all the wondrous gains made in Artificial Intelligence, where Watson now is the state-of-the art, A.I. remains algorithmic, and for that, it has inherent limitations that don't get enough attention. Computer scientists and mathematicians have know for generations that some surprisingly straightforward problems have no algorithmic solution. That is, some tasks cannot be accomplished by any universal step-by-step codified procedure. Examples include the Halting Problem and the Travelling Salesperson Problem. If these simple challenges have no algorithm, we need be more sober in our expectations of computerised intelligence.
A key limitation of any programmed algorithm is that it must make its decisions using a fixed set of inputs that are known and fully characterised (by the programmer) at design time. If you spring an unexpected input on any computer, it can fail, and yet that's what life is all about -- surprises. No mathematician seriously claims that what humans do is somehow magic; most believe we are computers made of meat. Nevertheless, when paradoxes like the Halting Problem abound, we can be sure that computing and cognition are not what they seem. We should hope these conundrums are better understood before putting too much faith in computers doing deep human work.
And yet, predictably, futurists are jumping ahead to imagine "Watson apps" in which patients access the supercomputer for themselves. Even if there were reliable algorithms for doctoring, I reckon the "Watson app" is a giant step, because of the complex way the patient's conditions are assessed and data is gathered for the diagnosis. That is, the taking of the medical history.
In these days of billion dollar investments in electronic health records (EHRs), we tend to think that medical decisions are all about the data. When politicians announce EHR programs they often boast that patients won't have to go through the rigmarole of giving their history over and over again to multiple doctors as they move through an episode of care. This is actually a serious misunderstanding of the importance in clinical decision-making of the interaction between medico and patient when the history is taken. It's subtle. The things a patient chooses to tell, the things they seem to be hiding, and the questions that make them anxious, all guide an experienced medico when taking a history, and provide extra cues (metadata if you will) about the patient’s condition.
Now, Watson may well have the ability to navigate this complexity and conduct a very sophisticated Q&A. It will certainly have a vastly bigger and more reliable memory of cases than any doctor, and with that it can steer a dynamic patient questionnaire. But will Watson be good enough to be made available direct to patients through an app, with no expert human mediation? Or will a host of new input errors result from patients typing their answers into a smart phone or speaking into a microphone, without any face-to-face subtlety (let alone human warmth)? It was true of mainframes and it’s just as true of the best A.I.: Bulldust in, bulldust out.
Finally, Watson's existing linguistic limitations are not to be underestimated. It is surely not trivial that Watson struggles with puns and humour. Futurist Mark Pesce when discussing Watson remarked in passing that scientists don’t understand the "quirks of language and intelligence" that create humour. The question of what makes us laugh does in fact occupy some of the finest minds in cognitive and social science. So we are a long way from being able to mechanise humour. And this matters because for the foreseeable future, it puts a great deal of social intercourse beyond AI's reach.
In between the extremes of laugh-out-loud comedy and a doctor’s dry written notes lies a spectrum of expressive subtleties, like a blush, an uncomfortable laugh, shame, and the humiliation that goes with some patients’ lived experience of illness. Watson may understand the English language, but does it understand people?
Watson can answer questions, but good doctors ask a lot of questions too. When will this amazing computer be able to hold the sort of two-way conversation that we would call a decent "bedside manner"?
Have a disruptive technology implementation story? Get recognised for your leadership. Apply for the 2014 SuperNova Awards for leaders in disruptive technology.
For the second time in as many months, a grave bug has emerged in core Internet security software. In February it was the "Goto Fail" bug in the Apple operating system iOS that left web site security inoperable; now we have "Heartbleed", a flaw that leaves many secure web servers in fact open to attackers sniffing memory contents looking for passwords and keys.
Who should care?
There is no shortage of advice on what to do if you're a user. And it's clear how to remediate the Heartbleed bug if you're a web administrator (a fix has been released). But what is the software fraternity going to do to reduce the incidence of these disastrous human errors? In my view, Goto Fail and Heartbleed are emblematic of chaotic software craftsmanship. It appears that goto statements are used with gay abandon throughout web software today, creating exactly the unmaintainable spaghetti code that the founders of Structured Programming warned us about in the 1970s. Testing is evidently lax; code inspection seems non-existent. The Heartbleed flaw is in a piece of widely used Open Source software, and was over-looked first by the programmer, and then by the designated inspector, and then it went unnoticed for two years in the wild.
What are the ramifications of Heartbleed?
"Heartbleed" is a flaw in an obscure low level feature of the "Transport Layer Security" (TLS) protocol. TLS has an optional feature dubbed "Heartbeat" which a computer connected in a secure session can use to periodically test if the other computer is still alive. Heartbeat involves sending a request message with some dummy payload, and getting back a response with duplicate payload. The bug in Heartbeat means the responding computer can be tricked into sending back a dump of 64 kiloytes of memory, because the payload length variable goes unchecked. (For the technically minded, this error is qualitatively similar to a buffer overload; see also the OpenSSL Project description of the bug). Being server memory used in security management, that random grab has a good chance of including sensitive TLS-related data like passwords, credit card numbers and even TLS session keys. The bug is confined to the OpenSSL security library, where it was introduced inadvertently as part of some TLS improvements in late 2011.
The flawed code is present in almost all Open Source web servers, or around 66% of all web servers worldwide. However not all servers on the Internet run SSL/TLS secure sessions. Security experts Netcraft run automatic surveys and have worked out that around 17% of all Internet sites would be affected by Heartbleed – or around half a million widely used addresses. These include many banks, financial services, government services, social media companies and employer extranets. An added complication is that the Heartbeat feature leaves no audit trail, and so a Heartbleed exploit is undetectable.
If you visit an affected site and start a secure ("padlocked") session, then an attacker that knows about Heartbleed can grab random pieces of memory from your session. Researchers have demonstrated that session keys can be retrieved, although it is said to be difficult. Nevertheless, Heartbleed has been described by some of the most respected and prudent commentators as catastrophic. Bruce Schneier for one rates its seriousness as "11 out of 10".
Should we panic?
No. The first rule in any emergency is "Don't Panic". But nevertheless, this is an emergency.
The risk of any individual having been harmed through Heartbleed is probably low, but the consequences are potentially grave (if for example your bank is affected). And soon enough, it will be simple and cheap to take action, so you will hear experts say 'it is prudent to assume you have been compromised' and to change your passwords.
However, you need to wait rather than rush into premature action. Until the websites you use have been fixed, changing passwords now may leave you more vulnerable, because it's highly likely that criminals are trying to exploit Heartbleed while they can. It's best to avoid using any secure websites for the time being. We should redouble the usual Internet precautions: check your credit card and bank statements (but not online for the time being!). Stay extra alert to suspicious looking emails not just from strangers but from your friends and colleagues too, for their cloud mail accounts might have been hacked. And seek out the latest news from your e-commerce sites, banks, government and so on. The Australian banks for instance were relatively quick to act; by April 10 the five biggest institutions confirmed they were safe.
Lessons for the Software Craft
Heartbleed for me is the last straw. I call it pathetic that mission critical code can harbour flaws like this. So for a start, in the interests of clarity, I will no longer use the term "Software Engineering". I've written a lot in the past about the practice and the nascent profession of programming but it seems we're just going backwards. I'm aware that calling programming a "craft" will annoy some people; honestly, I mean no offence to basket weavers.
I'm no merchant of doom. I'm not going to stop banking and shopping online (however I do refuse Internet facing Electronic Health Records, and I would not use a self-drive car). My focus is on software development processes and system security.
The modern world is increasingly dependent on software, so it passes understanding we still tolerate such ad hoc development processes.
The programmer responsible for the Heartbleed bug has explained that he made a number of changes to the code and that he "missed validating a variable" (referring to the unchecked length of the Heartbeat payload). The designated reviewer of the OpenSSL changes also missed that the length was not validated. The software was released into the wild in March 2012. It went unnoticed (well, unreported) until a few weeks ago and was rectified in an OpenSSL release on April 7.
I'd like to avoid apportioning individual blame, so I am not interested in the names of the programmer and the reviewer. But we have to ask: when so many application security issues boil down to overflow problems, why is it not second nature to watch out for bugs like Heartbleed? How did experienced programmers make such an error? Why was this flaw out in the wild for two years before it was spotted? I thought one of the core precepts of Open Source Software was that having many eyes looking over the code means that errors will be picked up. But code inspection seems not to be widely practiced anymore. There's not much point having open software if people aren't actually looking!
As an aside, criminal hackers know all about overflow errors and might be more motivated to find them than innocent developers. I fear that the Heartbleed overflow bug could have been noticed very quickly by hackers who pore over new releases looking for exactly this type of exploit, or equally by the NSA which is reported to have known about it from the beginning.
Where does this leave systems integrators and enterprise developers? Have they become accustomed to taking Open Source Software modules and building them in, without a whole lot of regression testing? There's a lot to be said for Free and Open Source Software (FOSS) but no enterprise can take "free" too literally; the total cost of development has to include reasonable specification, verification and testing of the integrated whole.
As discussed in the wake of Goto Fail, we need to urgently and radically lift coding standards.
'The widely publicised and very serious "gotofail" bug in iOS7 took me back ...
Early in my career I spent seven years in a very special software development environment. I didn't know it at the time, but this experience set the scene for much of my understanding of information security two decades later. I was in a team with a rigorous software development lifecycle; we attained ISO 9001 certification way back in 1998. My company deployed 30 software engineers in product development, 10 of whom were dedicated to testing. Other programmers elsewhere independently wrote manufacture test systems. We spent a lot of time researching leading edge development methodologies, such as Cleanroom, and formal specification languages like Z.
We wrote our own real time multi-tasking operating system; we even wrote our own C compiler and device drivers! Literally every single bit of the executable code was under our control. "Anal" doesn't even begin to describe our corporate culture.
Why all the fuss? Because at Telectronics Pacing Systems, over 1986-1990, we wrote the code for the world's first software controlled implantable defibrillator, the Guardian 4210.
The team spent relatively little time actually coding; we were mostly occupied writing and reviewing documents. And then there were the code inspections. We walked through pseudo-code during spec reviews, and source code during unit validation. And before finally shipping the product, we inspected the entire 40,000 lines of source code. That exercise took five people two months.
For critical modules, like the kernel and error correction routines, we walked through the compiled assembly code. We took the time to simulate the step-by-step operation of the machine code using pen and paper, each team member role-playing parts of the microprocessor (Phil would pretend to be the accumulator, Lou the program counter, me the index register). By the end of it all, we had several people who knew the defib's software like the back of their hand.
And we had demonstrably the most reliable real time software ever written. After amassing several thousand implant-years, we measured a bug rate of less than one in 10,000 lines.
The implant software team had a deserved reputation as pedants. Over 25 person years, the average rate of production was one line of debugged C per team member per day. We were painstaking, perfectionist, purist. And argumentative! Some of our debates were excruciating to behold. We fought over definitions of “verification” and “validation”; we disputed algorithms and test tools, languages and coding standards. We were even precious about code layout, which seemed to some pretty silly at the time.
Yet 20 years later, purists are looking good.
Last week saw widespread attention to a bug in Apple's iOS operating system which rendered website security impotent. The problem arose from a single superfluous line of code – an extra goto statement – that nullified checking of SSL connections, leaving users totally vulnerable to fake websites. The Twitterverse nicknamed the flaw #gotofail.
There are all sorts of interesting quality control questions in the #gotofail experience.
- Was the code inspected? Do companies even do code inspections these days?
- The extra goto was said to be a recent change to the source; if that's the case, what regression testing was performed on the change?
- How are test cases selected?
- For something as important as SSL, are there not test rigs with simulated rogue websites to stress test security systems before release?
There seems to have been egregious shortcomings at every level : code design, code inspection, and testing.
A lot of attention is being given to the code layout. The spurious goto is indented in such a way that it appears to be part of a branch, but it is not. If curly braces were used religiously, or if an automatic indenting tool was applied, then the bug would have been more obvious (assuming that the code gets inspected). I agree of course that layout and coding standards are important, but there is a much more robust way to make source code clearer.
Beyond the lax testing and quality control, there is also a software-theoretic question in all this that is getting hardly any attention: Why are programmers using ANY goto statements at all?
I was taught at college and later at Telectronics to avoid goto statements at all cost. Yes, on rare occasions a goto statement makes the code more compact, but with care, a program can almost always be structured to be compact in other ways. Don't programmers care anymore about elegance in logic design? Don't they make efforts to set out their code in a rigorous structured manner?
The conventional wisdom is that goto statements make source code harder to understand, harder to test and harder to maintain. Kernighan and Ritchie - UNIX pioneers and authors of the classic C programming textbook - said the goto statement is "infinitely abusable" and it "be used sparingly if at all." Before them, one of programming's giants, Edsger Dijkstra, wrote in 1968 that "The go to statement ... is too much an invitation to make a mess of one's program"; see Go To Statement Considered Harmful. The goto creates spaghetti code. The landmark structured programming language PASCAL doesn't even have a goto statement! At Telectronics our coding standard prohibited without exception gotos in all implantable software.
Hard to understand, hard to test and hard to maintain is exactly what we see in the flawed iOS7 code. The critical bug never would have happened if Apple too banned the goto.
Now, I am hardly going to suggest that fanatical coding standards and intellectual rigor are sufficient to make software secure (see also "Security Isn’t Secure). It's unlikely that many commercial developers will be able to cost-justify exhaustive code walkthroughs when millions of lines are involved even in the humble mobile phone. It’s not as if lives depend on commercial software.
Or do they?!
Let’s leave aside that vexed question for now and return to fundamentals.
The #gotofail episode will become a text book example of not just poor attention to detail, but moreover, the importance of disciplined logic, rigor, elegance, and fundamental coding theory.
A still deeper lesson in all this is the fragility of software. Prof Arie van Deursen nicely describes the iOS7 routine as "brittle". I want to suggest that all software is tragically fragile. It takes just one line of silly code to bring security to its knees. The sheer non-linearity of software – the ability for one line of software anywhere in a hundred million lines to have unbounded impact on the rest of the system – is what separates development from conventional engineering practice. Software doesn’t obey the laws of physics. No non-trivial software can ever fully tested, and we have gone too far for the software we live with to be comprehensively proof read. We have yet to build the sorts of software tools and best practice and habits that would merit the title "engineering".
I’d like to close with a philosophical musing that might have appealed to my old mentors at Telectronics. We have reached a sort of pinnacle in post-modernism where the real world has come to pivot precariously on pure text. It is weird and wonderful that engineers are arguing about the layout of source code – as if they are poetry critics.
We have come to depend daily on great obscure texts, drafted not by people we can truthfully call "engineers" but by a largely anarchic community we would be better of calling playwrights.
That is, information security is not intellectually secure. Almost every precept of orthodox information security is ready for a shake-up. Infosec practices are built on crumbling foundations.
UPDATE: I've been selected to speak on this topic at the 2014 AusCERT Conference - the biggest information security event in Australasia.
The recent tragic experience of data breaches -- at Target, Snapchat, Adobe Systems and RSA to name a very few -- shows that orthodox information security is simply not up to the task of securing serious digital assets. We have to face facts: no amount of today's conventional security is ever going to protect assets worth billions of dollars.
Our approach to infosec is based on old management process standards (which can be traced back to ISO 9000) and a ponderous technology neutrality that overly emphasises people and processes. The things we call "Information Security Management Systems" are actually not systems that any engineer would recognise but instead are flabby sets of documents and audit procedures.
"Continuous security improvement" in reality is continuous document engorgement.
Most ISMSs sit passively on shelves and share drives doing nothing for 12 months, until the next audit, when the papers become the centre of attention (not the actual security). Audit has become a sick joke. ISO 27000 and PCI assessors have the nerve to tell us their work only provides a snapshot, and if a breach occurs between visits, it's not their fault. In their words they admit therefore that audits do not predict performance between audits. While nobody is looking, our credit card numbers are about as secure as Schrodinger's Cat!
The deep problem is that computer systems have become so very complex and so very fragile that they are not manageable by traditional means. Our standard security tools, including Threat & Risk Assessment and hierarchical layered network design, are rooted in conventional engineering. Failure Modes & Criticality Analysis works well in linear systems, where small perturbations have small effects, but IT is utterly unlike this. The smallest most trivial omission in software or in a server configuration can have dire and unlimited consequences. It's like we're playing Jenga.
Update: Barely a month after I wrote this blog, we heard about the "goto fail" bug in the Apple iOS SSL routines, which resulted from one spurious line of code. It might have been more obvious to the programmer and/or any code reviewer had the code been indented differently or if curly braces were used rigorously.
Security needs to be re-thought from the ground up. We need some bigger ideas.
We need less rigid, less formulaic security management structures, to encourage people at the coal face to exercise their judgement and skill. We need straight talking CISOs with deep technical experience in how computers really work, and not 'suits' more focused on the C-suite than the dev teams. We have to stop writing impenetrable hierarchical security policies and SOPs (in the "waterfall" manner we recognised decades ago fails to do much good in software development). And we need to equate security with software quality and reliability, and demand that adequate time and resources be allowed for the detailed work to be done right.
If we can't protect credit card numbers today, we urgently need to do things differently, standing as we are on the brink of the Internet of Things.
I am speaking at next week's AusCERT security conference, on how to make privacy real for technologists. This is an edited version of my conference abstract.
Privacy by Design is a concept founded by the Ontario Privacy Commissioner Dr. Ann Cavoukian. Dubbed "PbD", it's basically the same good idea as designing in quality, or designing in security. It has caught on nicely as a mantra for privacy advocates worldwide. The trouble is, few designers or security professionals can tell what it means.
Privacy continues to be a bit of a jungle for security practitioners. It's not that they're uninterested in privacy; rather, it's rare for privacy objectives to be expressed in ways they can relate to. Only one of the 10 or 11 or more privacy principles we have in Australia is ever labelled "security" and even then, all it will say is security must be "reasonable" given the sensitivity of the Personal Information concerned. With this legalistic language, privacy is somewhat opaque to the engineering mind; security professionals naturally see it as meaning little more than encryption and maybe some access control.
To elevate privacy practice from the personal plane to the professional, we need to frame privacy objectives in a way that generates achievable design requirements. This presentation will showcase a new methodology to do this, by extending the familiar standardised Threat & Risk Assessment (TRA). A hybrid Privacy & Security TRA adds extra dimensions to the information asset inventory. Classically an information asset inventory accounts for the confidentiality, integrity and availability (C.I.A.) of each asset; the extended methodology goes further, to identify which assets represent Personal Information, and for those assets, lists privacy related attributes like consent status, accessibility and transparency. The methodology also broadens the customary set of threats to include over-collection, unconsented disclosure, incomplete responses to access requests, over-retention and so on.
The extended TRA methodology brings security and privacy practices closer together, giving real meaning to the goal of Privacy by Design. Privacy and security are sometimes thought to be in conflict, and indeed they often are. We should not sugar coat this; after all, systems designers are of course well accustomed to tensions between competing design objectives. To do a better job at privacy, security practitioners need new tools like the Security & Privacy TRA to surface the requirements in an actionable way.
The hybrid Threat & Risk Assessment
TRAs are widely practiced during requirements analysis stages of large information systems projects. There are a number of standards that guide the conduct of TRAs, such as ISO 31000. A TRA first catalogues all information assets controlled by the system, and then systematically explores all foreseeable adverse events that threaten those assets. Relative risk is then gauged, usually as a product of threat likelihood and severity, and the set of threats to be prioritised according to importance. Threat mitigations are then considered and the expected residual risks calculated. An especially good thing about a formal TRA is that it presents management with the risk profile to be expected after the security program is implemented, and fosters consciousness of the reality that finite risks always remain.
The diagram below illustrates a conventional TRA workflow (yellow), plus the extensions to cover privacy design (red). The important privacy qualities of Personal Information assets include Accessibility, Permissibility (to disclose), Sensitivity (of e.g. health information), Transparency (of the reasons for collection) and Quality. Typical threats to privacy include over-collection (which can be an adverse consequence of excessive event logging or diagnostics), over-disclosure, incompleteness of records furnished in response to access requests, and over-retention of PI beyond the prima facie business requirement. When it comes to mitigating privacy threats, security practitioners may be pleasantly surprised to find that most of their building blocks are applicable.
The hybrid Security-Privacy Threat & Risk Assessment will help ICT practitioners put Privacy by Design into practice. It helps reduce privacy principles to information systems engineering requirements, and surfaces potential tensions between security practices and privacy. ICT design frequently deals with competing requirements. When engineers have the right tools, they can deal properly with privacy.
I have come to believe that a systemic conceptual shortfall affects typical technologists' thinking about privacy. It may be that engineers tend to take literally the well-meaning slogan that "privacy is not a technology issue". I say this in all seriousness.
Online, we're talking about data privacy, or data protection, but systems designers tend to bring to work a spectrum of personal outlooks about privacy in the human sphere. Yet what matters is the precise wording of data privacy law, like Australia's Privacy Act. To illustrate the difference, here's the sort of experience I've had time and time again.
During the course of conducting a PIA in 2011, I spent time with the development team working on a new government database. These were good, senior people, with sophisticated understanding of information architecture. But they harboured restrictive views about privacy. An important clue was the way they referred to "private" information rather than Personal Information (or equivalently, Personally Identifiable Information, PII). After explaining that Personal Information is the operable term in Australian legislation, and reviewing its definition from the Privacy Act, we found that the team had failed to appreciate the extent of the PI in their system. They overlooked that most of their audit logs collect PI, albeit indirectly and automatically. Further, they had not appreciated that information about clients in their register provided by third parties was also PI (despite it being intuitively "less private" by virtue of originating from others). I attributed these blind spots to the developers' weak and informal frame of "private" information. Online and in data privacy law alike, things are very crisp. The definition of Personal Information -- namely any data relating to an individual whose identity is readily apparent -- sets a low bar, embracing a great many data classes and, by extension, informatics processes. It's a nice analytical definition that is readily factored into systems analysis. After the team got that, the PIA in question proceeded apace and we found and rectified several privacy risks that had gone unnoticed.
Here are some more of the many recurring misconceptions I've noticed over the past decade:
- "Personal" Information is sometimes taken to mean especially delicate information such as payment card details, rather than any information pertaining to an identifiable individual such as email addresses in many cases; an exchange between US data breach analyst Jake Kouns and me over the Epsilon incident in 2011 is revealing of a technologists' systemically narrow idea of PII;
- the act of collecting PI is sometimes regarded only in relation to direct collection from the individual concerned; technologists can overlook that PI provided by a third party to a data custodian is nevertheless being collected by the custodian, and they can fail to appreciate that generating PI internally, through event logging for instance, can also represent collection
- even if they are aware of points such as Australia's Access and Correction Principle, database administrators can be unaware that, technically, individuals requesting a copy of information held about them should also be provided with pertinent event logs; a non-trivial case where individuals can have a genuine interest in reviewing event logs is when they want to know if an organisation's staff have been accessing their records.
These instances, among many others in my experience working across both information security and privacy, show that ICT practitioners suffer important gaps in their understanding. Security professionals in particular may be forgiven for thinking that most legislated Privacy Principles are legal niceties irrelevant to them, for generally only one of the principles in any given set is overtly about security; see:
- no. 5 of the eight OECD Privacy Principles
- no. 4 of the five Fair Information Practice Principles in the US
- no. 8 of the ten Generally Accepted Privacy Principles of the US and Canadian accounting bodies,
- no. 4 of the ten old National Privacy Principles of Australia, and
- no. 11 of the 13 new Australian Privacy Principles (APPs).
Yet every one of the privacy principles is impacted by information technology and security practices; see Mapping Privacy requirements onto the IT function, Privacy Law & Policy Reporter, Vol. 10.1& 10.2, 2003. I believe the gaps in the privacy knowledge of ICT practitioners are not random but are systemic, probably resulting from privacy training for non-privacy professionals being ad hoc and not properly integrated with their particular world views.
To properly deal with data privacy, ICT practitioners need to have privacy framed in a way that leads to objective design requirements. Luckily there already exist several unifying frameworks for systematising the work of dev teams. One example that resonates strongly with data privacy practice is the Threat & Risk Assessment (TRA).
The TRA is an infosec requirements analysis tool, widely practiced in the public and private sectors. There are a number of standards that guide the conduct of TRAs, such as ISO 31000. A TRA is used to systematically catalogue all foreseeable adverse events that threaten an organisation's information assets, identify candidate security controls (considering technologies, processes and personnel) to mitigate those threats, and most importantly, determine how much should be invested in each control to bring all risks down to an acceptable level. The TRA process delivers real world management decisions, understanding that non zero risks are ever present, and that no organisation has an unlimited security budget.
I have found that in practice, the TRA exercise is readily extensible as an aid to Privacy by Design. A TRA can expressly incorporate privacy as an attribute of information assets worth protecting, alongside the conventional security qualities of confidentiality, integrity and availability ("C.I.A."). A crucial subtlety here is that privacy is not the same as confidentiality, yet many frequently conflate the two. A fuller understanding of privacy leads designers to consider the Collection, Use, Disclosure and Access & Correction principles, over and above confidentiality when they analyse information assets.
Lockstep continues to actively research the closer integration of security and privacy practices.
I'm an ex software "engineer" [I have reservations about that term] with some life experience of ultra high rel development practices. It's fascinating how much about software quality I learned in the 1980s and 90s is relevant to info sec today.
I've had a trip down memory lane triggered by Karen Sandler's presentation at LinuxConf12 in Ballarat http://t.co/xvUkkaGl and her paper "Killed by code".
The software in implantable defibrillators
I'm still working my way through Karen Sandler's materials. So this post is a work in progress.
What's really stark on first viewing of Karen's talk is the culture she experienced and how it differs from the implantable defib industry I knew in its beginnings 25 years ago.
Karen had an incredibly hard and very off-putting time getting the company that made her defib to explain their software. But when we started in this field, every single person in the company -- and many of our doctors -- would have been able to answer the question What software does this defib run on?: the answer was "ours". And moreover, the FDA were highly aware of software quality issues. The whole medical device industry was still on edge from the notorious Therac 25 episode, a watershed in software verification.
A personal story
I was part of the team that wrote the code for the world's first software controlled implantable cadrioverter/defibrillator (ICD).
In 1990 Telectronics (a tragic legend of Australian technology) released the model 4210, which was just the fourth or fifth ICD on the market (the first few being hard-wired devices from CPI Inc. and Telectronics). The computing technology was severely restricted by several design factors, most especially ultra low power consumption, and a very limited number of microprocessor vendors that would warrant their chips for use in medical devices. The 4210 defib used a semi-customised 8 bit micro-controller based on the 6502, and a 32 KB byte-organised SRAM chip that held the entire executable. The micro clocked at 128kHz, fully eight times slower than the near identical micro in the Apple II a decade earlier. The software had to be efficient, not only to ensure it could make some very tough real time rendezvous, but to keep the power consumption down; the micro consumed about 30% of the device's power over its nominal five year lifetime.
We wrote mostly in C, with some assembly coding for the kernel and some performance sensitive routines. The kernel was of our own design, multi-tasking, with hard real time performance requirements (in particular, for obvious reasons the system had to respond within tight specs to heart beat interrupts and we had to show we weren't ever going to miss an interrupt!) We also wrote the C compiler.
The 4210's software was 40,000 lines of C, developed by a team of 5-6 over several years; the total effort was 25 person-years. Some of the testing and pre-release validation is described in my blog post about coding being like play writing. The final code inspection involved a team of five working five-to-six hour days for two months, reading aloud and understanding every single line. When occasion called for checking assembly instructions, sometimes we took turns with pencil and paper pretending to be the accumulators, the index registers, the program counter and so on. No stone was left unturned.
The final walk-through was quite a personnel management challenge. One of the senior engineers (a genius who also wrote our kernel and compiler) lobbied for inspecting the whole executable because he didn't want to rely on the correctness of the compiler -- but that would have taken six months. So we compromised by walking through only the assembly code for the critical modules, like the tachycardia detector and the interrupt handlers.
I mentioned that the kernel and compiler were home-grown. So this meant that the company controlled literally every single bit of code running in its defibs. And I reiterate we had several individuals who knew the source code end to end.
By the way, these days I will come across developers in the smartcard industry who find it hard working on applets that are 5 or 10 kilobytes small. Compare say a digital signing applet with an ICD, with its cardiac monitoring algorithms, treatment algorithms, telemetry controller, data logging and operating system all squeezed into 32KB.
We amassed several thousand implant-years of experience with the 4210 before it was superseded. After release, we found two or three minor bugs, which we fixed with software upgrades. None would have caused a misfire, neither false positive or false negative.
Yes, for the upgrade we could write into the RAM over a proprietary telemetry protocol. In fact the main reason for the one major software upgrade in the field was to add error correction because after hundreds of device-years we noticed higher than expected bit flips from natural background radiation. That's a helluva story in itself. It was observed that had the code been in ROM, we couldn't have changed it but we wouldn't have had to change it for bit flips either.
Morals of the story
Anyway, some of the morals of the story so far:
- Software then was cool and topical, and the whole company knew how to talk about it. The real experts -- the dozen or so people in Sydney directly involved in the development -- were all well known worldwide by the executives, the sales reps, the field clinical engineers, and regulatory affairs. And we got lots of questions (in contrast to Karen Sandler's experience where all the caridologists and company people said nobody ever asked about the code).
- Everything about the software was controlled by the company: the operating system, the chip platform, the compiler, the telemetry protocol.
- We had a team of people that knew the code like the backs of their hands. Better in fact. It was reliable and, in hindsight, impregnable. Not that we worried about malware back in 1987-1990.
Where has software development gone?
So the sorts of issues that Karen Sandler is raising now, over two decades on, are astonishing to me on so many levels.
- Why would anyone decide to write life support software on someone else's platform?
- Why would they use wifi or Bluetooth for telemetry?
- And if the medical device companies cut corners in software development, one wonders what the defense industry is doing with their drone flight controllers and other "smart" weaponry with its countless millions of lines of opaque software.
The software-as-a-profession debate continues largely untouched by each generation's innovations in production methods. Most of us in the 90s thought that formal methods, reuse and enforced modularity would introduce to software some of the hallmarks of real engineering: predictability, repeatability, measurability and quality. Yet despite Objected Oriented methods and sophisticated CASE tools, many of the human traits of software-as-a-craft remain with us.
The "software crisis" – the systemic inability to estimate software projects accurately, to deliver what's promised, and to meet quality expectations – is over 40 years old. Its causes are multiple and subtle, and despite ample innovation in languages, tools and methodologies, debates continue over what ails the software industry. The latest skirmish is a provocative suggestion from Forrester analyst Mark Gualtieri that we shift the emphasis from independent Quality Assurance back onto developers’ own responsibilities, or abandon QA altogether. There has been a strong reaction! The general devotion to QA I think is aided and abetted by today’s widespread fashion for corporate governance.
But we should listen to radical ideas like Gualtieri’s, rather than maintain a slavish devotion to orthodoxies. We should recognise that software engineering is inherently different from conventional engineering, because software itself is a different kind of material. Its properties make it less amenable to regular governance.
Simply, software does not obey the laws of physics. Building skyscrapers, tunnels, dams and bridges is relatively predictable. You start with site surveys and foundations, erect a sturdy framework and all sorts of temporary formers, flesh out the structure, fill in the services like power and plumbing, do the fit-out, and finally take away all the scaffolding.
Specifications for real engineering projects don’t change much, even over several years for really big initiatives. And the engineering tools don't change at all.
Software is utterly unlike this. You can start writing software anywhere you like, and before the spec is signed off. There aren’t any raw materials to specify and buy, and no quantity surveyors to deal with. Metaphorically speaking, the plumbing can go in before the framework. Hell, you don't even need a framework! Nothing physical holds a software system up. Flimsy software, as a material, is indistinguishable from the best code. No laws of physics dictate that you start at the bottom and work your way slowly upwards, using a symbiosis of material properties and gravity to keep your construction stable and well-behaved as it grows. The process of real world engineering is thick with natural constraints that ensure predictability (just imagine how wobbly a house would be if you could lay the bricks from the top down) whereas software development processes are almost totally arbitrary, except for the odd stricture imposed by high level languages.
Real world systems are thoroughly compartmentalised naturally. If a bearing fails in an air-conditioning plant in the basement, it’s not going to affect the integrity of any of the floor plates. On the other hand, nothing physically decouples lines of code; a bug in one part of the program can impinge on almost any other part (which incidentally renders traditional failure modes and effects analysis impossible). We only modularise software by artificial means, like banning goto statements and self-modifying code.
Coding is fast and furious. In a single day, a programmer can create a system probably more complex than an airport that takes more than 10,000 person-years to build. And software development is tremendous creative fun. Let's be honest: it's why the majority of programmers chose their craft in the first place.
Ironically the rapidity of programming contributes significantly to software project overruns. We only use software in information systems because it's faster to make and easier to modify than wired logic. So the temptation is irresistible to keep specs fluid and to accommodate new requirements at any time. Famously, the differences between prototype, beta and production product are marginal and arbitrary. Management and marketing take advantage of this fact, and unfortunately software engineers themselves yield too readily to the attraction of the last minute tweak.
I suggest programming is more like play writing than engineering, and many programmers (especially the really good ones!) are just as manageable as poets.
In both software and play writing, structure is almost entirely arbitrary. Because neither obey the laws of physics, the structure of software and plays comes from the act of composition. A good software engineer will know their composition from end to end. But another programmer can always come along and edit the work, inserting their own code as they see fit. It is received wisdom in programming that most bugs arise from imprudent changes made to old code.
Messing with a carefully written piece of software is fraught with danger, just as it is with a finished play. I could take Hamlet for instance, and hack it as easily as I might hack an old program -- add a character or two, or a whole new scene -- but the entire internal logic of the play would almost certainly be wrecked. It would be “buggy”.
I was a software development manager for some years in the cardiac pacemaker industry. We developed the world’s first software controlled automatic implantable defibrillator. It had several tens of thousands of lines of C, developed at a rate of about one tested line of code per person per day. We quantified it as the most reliable real time software ever written at the time.
I believe the outstanding quality resulted from a handful of special grassroots techniques:
- We had independent software test teams that developed their own test cases and tools
- We did obsessive source code inspections on all units before integration. And in the end, before we shipped, we did an end-to-end walkthrough of the frozen software. It took six of us two months straight. So we had several people who knew the entire object intimately.
- We did early informal design reviews. As team leader, I favoured having my developers do a whiteboard presentation to the team of their early design ideas, no more than 48 hours after being given responsibility for a module. This helped prevent designers latching onto misconceptions at the formative stages.
- We took our time. I was concerned that the CASE tools we introduced in the mid 90s might make code rather too easy to trot out, so at the same time I set a new rule that developers had toturn their workstations off for a whole day once a week, and work with pen and paper.
- My internal coding standard included a requirement that when starting a new module, developers write their comments before they write their code, and their comments had to describe ‘why’ not ‘what’. Code is all syntax; the meaning and intent of any software can only be found in the natural language comments.
Code is so very unlike the stuff of other professions – soil and gravel, metals and alloys, nuts and bolts, electronics, even human flesh and blood - that the metaphor of engineering in the phrase “software engineering” may be dangerously misplaced. By coopting the term we have might have started out on the wrong foot, underestimating the fundamental challenge of forging a software profession. It won't be until software engineering develops the normative tools and standards, culture and patience of a true profession that the software crisis will turn around. And then corporate governance will have something to govern in software development.