The media gets excited about gene therapy. With the sequencing of genomes becoming ever cheaper and accessible, a grand vision of gene therapy is now being put about all too casually by futurists in which defective genetic codes are simply edited out and replaced by working ones. At the same time there is broader idea of "Precision Medicine" which envisages doctors scanning your entire DNA blueprint, instantly spotting the defects that ail you, and ordering up a set of customized pharmaceuticals precisely fitted to your biochemical idiosyncrasies.
There is more to gene therapy -- genetic engineering of live patients -- than the futurists let on.
A big question for mine is this: How, precisely, will the DNA repairs be made? Lay people might be left to presume it's like patching your operating system, which is not a bad metaphor, until you think a bit more about how and where patches are made to a computer.
A computer has one copy of any given software, stored in long term memory. And operating systems come with library functions for making updates. Patching software involves arriving with a set of corrections in a file, and requesting via APIs that the corrections be slotted into the right place, replacing the defective code.
But DNA doesn't work like this. While the genome is indeed something of an operating system, that's not the whole story. Sub-systems for making changes to the genome are not naturally built into an organism, because genes are only supposed to change at the time the software is installed. Our genomes are carved up en masse when germ cells (eggs and sperm) are made, and the genomes are put back together when we have sex, and then passed into our children. There is no part of the genetic operating system that allows selected parts of the genetic source code to be edited later, and -- this is the crucial bit -- spread through a living organism.
Genetic engineering, such as it is today, involves editing the genomes of embryos at a very early stage of their lifecycle, so the changes propagate as the embryo grows. Thus we have tomatoes fitted with arctic fish genes to stave off cold, and canola that resists pesticides. But the idea that's presented of gene therapy is very different; it has to impose changes to the genome in all the trillions of copies of the code in every cell in a fully developed organism. You see, there's another crucial thing about the DNA-is-software metaphor: there is no central long term program memory for our genes. Instead the DNA program is instantiated in every single cell of the body.
To change the DNA in a mature cell, geneticists have to edit it by means other than sexual reproduction. As I noted, there is no natural "API" for doing this, so they've invented a clever trick, co-opting viruses - nature's DNA hackers. Viruses work by squeezing their minuscule bodies through the cell walls of a host organism, latching onto DNA strands inside, and crudely adding their own code fragments, pretty much at random, into the host's genome. Viruses are designed (via evolution) to inject arbitrary genes into another organism's DNA (arbitrary relative to the purpose of the host DNA's that is). Viruses are just what gene therapists need to edit faulty DNA in situ.
I know a bit about cystic fibrosis and the visions for a genetic cure. The faulty gene that causes CF was identified decades ago and its effect on chlorine chemistry is well understood. By disrupting the way chlorine ions are handled in cells, CF ruins mucus membranes, with particularly bad results for the lungs and digestive system. From the 1980s, it was thought that repairs to the CF gene could be delivered to cells in the lung lining by an engineered virus carried in an aerosol. Because only a small fraction of cells exposed to the virus could have their genes so updated, scientists expected that the repairs would be both temporary and partial, and that fresh viruses would need to be delivered every few weeks, a period determined by the rate at which lung cells die and get replaced.
Now please think about the tacit promises of gene therapy today. The story we hear is essentially all about the wondrous informatics and the IT. Within a few years we're told doctors will be able to sequence a patient's entire genome for a few dollars in a few minutes, using a desk top machine in the office. It's all down to Moore's Law and computer technology. There's an assumption that as the power goes up and the costs go down, geneticists will in parallel work out what all the genes mean, including how they interact, and develop a catalog of known faults and logical repairs.
Let's run with that optimism (despite the fact that just a few years ago they found that "Junk DNA" turns out be active in ways that were not predicted; it's a lot like Dark Matter - important, ubiquitous and mysterious). The critical missing piece of the gene therapy story is how the patches are going to be made. Some reports imply that a whole clean new genome can be synthesised and somehow installed in the patient. Sorry, but how?
For thirty years they've tried and failed to rectify the one cystic fibrosis gene in readily accessible lung cells. Now we're supposed to believe that whole stretches of DNA are going to swapped out in all the cells of the body? It's vastly harder than the CF problem, on at least three dimensions: (1) the numbers and complexity of the genes involved, (2) the numbers of cells and tissue systems that need to be patched all at once, and (3) the delivery mechanism for getting modified viruses (I guess) where they need to do their stuff.
It's so easy being a futurist. People adore your vision, and you don't need to worry about practicalities. The march of technology, seen with 20:20 hindsight, appears to make all dreams come true. Practicalities are left to sort themselves out.
But I think it takes more courage to say, of gene therapy, it's not going to happen.
It would be naive to expect the White House Cybersecurity Summit to have been less political. President Obama and his colleagues were in their comfort zone, talking up America's recent economic turnaround, and framing their recent wins squarely within Silicon Valley where the summit took place. With a few exceptions, the first two hours was more about green energy, jobs and manufacturing than cyber security. It was a lot like a lost episode of The West Wing.
The exceptions were important. Some speakers really nailed some security issues. I especially liked the morning contributions from Intel President Renee James and MasterCard CEO Ajay Banga. James highlighted that Intel has worked for 10 years to improve "the baseline of computing security", making her one of the few speakers to get anywhere near the inherent insecurity of our cyber infrastructure. The truth is that cyberspace is built on weak foundations; the software development practices and operating systems that bear the economy today were not built for the job. For mine, the Summit was too much about military/intelligence themed information sharing, and not enough about why our systems are so precarious. I know it's a dry subject but if they're serious about security, policy makers really have to engage with software quality and reliability, instead of thrilling to kids learning to code. Software development practices are to blame for many of our problems; more on software failures here.
Ajay Banga was one of several speakers to urge the end of passwords. He summed up the authentication problem very nicely: "Stop making us remember things in order to prove who we are". He touched on MasterCard's exploration of continuous authentication bracelets and biometrics (more news of which coincidentally came out today). It's important however that policy makers' understanding of digital infrastructure resilience, cybercrime and cyber terrorism isn't skewed by everyone's favourite security topic - customer authentication. Yes, it's in need of repair, yet authentication is not to blame for the vast majority of breaches. Mom and Pop struggle with passwords and they deserve better, but the vast majority of stolen personal data is lifted by organised criminals en masse from poorly secured back-end databases. Replacing customer passwords or giving everyone biometrics is not going to solve the breach epidemic.
Banga also indicated that the Information Highway should be more like road infrastructure. He highlighted that national routes are regulated, drivers are licensed, there are rules of the road, standardised signs, and enforcement. All these infrastructure arrangements leave plenty of room for innovation in car design, but it's accepted that "all cars have four wheels".
Tim Cook was then the warm-up act before Obama. Many on Twitter unkindly branded Cook's speech as an ad for Apple, paid for by the White House, but I'll accentuate the positives. Cook continues to campaign against business models that monetize personal data. He repeated his promise made after the ApplePay launch that they will not exploit the data they have on their customers. He put privacy before security in everything he said.
Cook painted a vision where digital wallets hold your passport, driver license and other personal documents, under the user's sole control, and without trading security for convenience. I trust that he's got the mobile phone Secure Element in mind; until we can sort out cybersecurity at large, I can't support the counter trend towards cloud-based wallets. The world's strongest banks still can't guarantee to keep credit card numbers safe, so we're hardly ready to put our entire identities in the cloud.
In his speech, President Obama reiterated his recent legislative agenda for information sharing, uniform breach notification, student digital privacy, and a Consumer Privacy Bill of Rights. He stressed the need for private-public partnership and cybersecurity responsibility to be shared between government and business. He reiterated the new Cyber Threat Intelligence Integration Center. And as flagged just before the summit, the president signed an Executive Order that will establish cyber threat information sharing "hubs" and standards to foster sharing while protecting privacy.
Obama told the audience that cybersecurity "is not an ideological issue". Of course that message was actually for Congress which is deliberating over his cyber legislation. But let's take a moment to think about how ideology really does permeate this arena. Three quasi-religious disputes come to mind immediately:
- Free speech trumps privacy. The ideals of free speech have been interpreted in the US in such a way that makes broad-based privacy law intractable. The US is one of only two major nations now without a general data protection statute (the other is China). It seems this impasse is rarely questioned anymore by either side of the privacy debate, but perhaps the scope of the First Amendment has been allowed to creep out too far, for now free speech rights are in effect being granted even to computers. Look at the controversy over the "Right to be Forgotten" (RTBF), where Google is being asked to remove certain personal search results if they are irrelevant, old and inaccurate. Jimmy Wales claims this requirement harms "our most fundamental rights of expression and privacy". But we're not talking about speech here, or even historical records, but rather the output of a computer algorithm, and a secret algorithm at that, operated in the service of an advertising business. The vociferous attacks on RTBF are very ideological indeed.
- "Innovation" trumps privacy. It's become an unexamined mantra that digital businesses require unfettered access to information. I don't dispute that some of the world's richest ever men, and some of the world's most powerful ever corporations have relied upon the raw data that exudes from the Internet. It's just like the riches uncovered by the black gold rush on the 1800s. But it's an ideological jump to extrapolate that all cyber innovation or digital entrepreneurship must continue the same way. Rampant data mining is laying waste to consumer confidence and trust in the Internet. Some reasonable degree of consumer rights regulation seems inevitable, and just, if we are to avert a digital Tragedy of the Commons.
- National Security trumps privacy. I am a rare privacy advocate who actually agrees that the privacy-security equilibrium needs to be adjusted. I believe the world has changed since some of our foundational values were codified, and civil liberties are just one desirable property of a very complicated social system. However, I call out one dimensional ideology when national security enthusiasts assert that privacy has to take a back seat. There are ways to explore a measured re-calibration of privacy, to maintain proportionality, respect and trust.
President Obama described the modern technological world as a "magnificent cathedral" and he made an appeal to "values embedded in the architecture of the system". We should look critically at whether the values of entrepreneurship, innovation and competitiveness embedded in the way digital business is done in America could be adjusted a little, to help restore the self-control and confidence that consumers keep telling us is evaporating online.
The ongoing debate (or spat) on Twitter about the "No Estimates" movement had me reaching for the archives.
Some now say that being forced to provide estimates is somehow counter-productive for software developers. I've long thought about programming productivity, and the paradox that software is too soft.
Some programmers want special treatment. In effect, "No Estimates" proponents are claiming their particular work is not amenable to traditional metrics and management. Now in a way, they're right; there is as yet no such thing as software "engineering". There are none of the handbooks or standards that feature in chemical, mechanical and electrical engineering. But nevertheless, if a programmer knows what they're doing - if they know their subject matter and how their code behaves - then providing estimates is not all that difficult. Disclaiming one's ability to predict how long a task will take is a weird way to try and engage with the business.
Software is definitely a difficult medium. It's highly non-linear, and breeds amazing complexity. But a great many of today's problems, like the recent #gotofail and Heartbleed scandals, are manifestly due to chaotic development practices.
As such, programmers are part of the problem.
I once wrote a letter to the editor of ComputerWorld about this ...
Yes indeed, IT is made the scapegoat for a great many project disasters (ComputerWorld 28 September, 2005, page 1). But it may prove fruitless to force orthodox project management and corporate governance methodologies onto big IT projects. And at the same time, IT "professionals" are not entirely free of blame.
So the KPMG Global IT Project Management Survey found that the vast majority of technology projects run over budget. In the main, "technology" means software, whether we build or buy. The "software crisis" - the systemic inability to estimate software projects accurately and to deliver what's promised - is about 40 years old. And it's more subtle than KPMG suggests in blaming corporate governance. It is fashionable at the moment to look to governance to rectify business problems but in this case, it really is a technology issue.
Software project management truly is different from all other technical fields, for software does not obey the laws of nature. Building skyscrapers, tunnels, dams and bridges is relatively predictable. You start with site surveys and foundations, erect a sturdy framework, fill in the services, fit it out, and take away the scaffolding. Specifications don't change much over a several year project, and the tools don't change at all.
But with software, you can start a big project anywhere you like, and before the spec is signed off. Metaphorically speaking, the plumbing can go in before the framework. Hell, you don't even need a framework! Nothing physical holds a software system up.
And software coding is fast and furious. In a single day, a programmer can create a system more complex than an airport that might take 10,000 person-years to build. So software development is fun. Let's be honest: it's why the majority of programmers chose their craft in the first place.
Ironically it's the rapidity of programming that contributes the most to project overruns. We only use software in information systems because it's fast to write and easy to modify. So the temptation is irresistible to keep specs fluid and to change requirements at any time. Famously, the differences between prototype, "beta release" and product are marginal and arbitrary. Management and marketing take advantage of this fact, and unfortunately software engineers themselves yield too readily to the attraction of the last minute tweak.
The same dynamics of course afflict third party software components. They tend to change too often and fail to meet expectations, making life hell for IT systems integrators.
It won't be until software engineering develops the tools, standards and culture of a true profession that any of this will change. Then corporate governance will have something to govern in big technology projects. Meanwhile, programmers will remain more like playwrights than engineers, and just as manageable.
Summary: BlackBerry is poised for a fresh and well differentiated play in the Internet of Things, with its combination of handset hardware security, its uniquely rated QNX operating system kernel, and its experience with the FIDO device authentication protocols.
To put it plainly, BlackBerry is not cool.
And neither is security.
But maybe two wrongs can make a right, in terms of a compelling story. BlackBerry's security story has always been strong, it's getting stronger, and it could save them.
Today I attended the BlackBerry Security Summit in New York City (Disclosure: my travel and accommodation were paid by BlackBerry). The event was announced very recently; none of my colleagues had heard of it. So what was the compelling need to put on a security show in New York? It turned out to be the 9:00am announcement that BlackBerry is acquiring the German voice security specialists Secusmart. BlackBerry and Secusmart have worked together for a long time; their stated aim is to put a real secure phone in the "hand of every President and every Chancellor".
Secusmart CEO Hans-Christoph Quelle is a forceful champion of voice security; in this age of evidently routine spying by state and competitors alike, there is enormous demand building for counter-surveillance in telephony and messaging. Secusmart is also responsible for the highly rated Micro SD cards that BlackBerry proudly use as removable security modules in their handsets. And this is where the SecuSmart tie-up really resonates for me. It comes hot on the heels of last week's Cloud Security Summit, where there was so much support for personal Hardware Security Modules (HSMs), be they Micro SD cards, USB keys, NFC Secure Elements, the good old "Trusted Platform Module" (TPM) or any number of proprietary chip sets.
Today's event also showcased BlackBerry's QNX division (acquired in 2010) and its secure operating system. CEO John Chen reckons that the software in 50% of connected cars runs on the QNX OS (and in high reliability settings like power stations, wind turbines and even gaming machines, the penetration is even higher). And so he is positioning BlackBerry as a major player in the Internet of Things.
We heard from QNX founder Dan Dodge about the elegance of their system. At just 100,000 lines of code, Dodge stressed that his team knows the software inside-out. There is not a single line of code in their OS that QNX did not write themselves. In contrast, such mastery is utterly impossible in the 15,000,000 lines that make up Linux or the estimated 50-70 million lines in Windows. It happens that I've recently lamented the parlous state of software quality and the need to return to first principles security. So I am on Dan Dodge's wavelength.
BlackBerry's security people had a little bit to say about identity as well, and apparently more's to come. For now, they are flagging that with 250 million customers in their messaging system, BBM represents "one of the biggest identity systems in the world". And as such the company does plan to "federate" it somehow. They reminded us at the same time of the BlackBerry Cloud slated for launch in December.
Going forward, the importance of strong, physical Two Factor Authentication for accessing the cloud is almost a given now. And the smartphone is fast becoming the predominant access mechanism, so the combination of secure elements, handsets and high security infrastructure is potent.
There's a lot that BlackBerry is keeping close to its chest, but for me one extant piece of the IoT puzzle was conspicuously absent today: the role of the FIDO Alliance protocols. After all, BlackBerry has been a FIDO Board Member for a long time. It seems to me that FIDO's protocols for exchanging verified authentication signals and information about devices should be an important element of BlackBerry's play in both its software infrastructure and its devices.
In closing, I'll revisit the very first thing we heard at today's event. It was a video testimonial, telling us "If you need nuclear security, you need BlackBerry". As I said, security really isn't cool. Jazzing up the company's ability to deliver "nuclear" grade to demanding clients is actually not the right message. Security in the Internet of Things -- and therefore in everyday life -- may turn out to be just as important.
We basically know that nuclear power plants are inherently risky; we know that planes will occasionally fall out of the sky. Paradoxically, the community has a reasonable appetite for risk and failures in very complex systems like those. Individually and/or collectively we have decided we just can't live without electricity and travel and so we've come to settle on a roughly acceptable finite cost in terms of failures. But when the mundanities of life go digital, the tolerance of failure will drop. When our cars and thermostats and light switches are connected to the Internet, and when a bug or a script kiddie's stunt can soon send whole neighbourhoods into a spin, consumers won't stand for it.
So the very best security we can currently engineer is in fact going to be necessary at scale for smart appliances, wearables, connected homes, smart meters and networked cars. We need a different gauge for this type of security, and it's going to be very tough to engineer and deploy economically. But right now, with its deep understanding of dependable OS's and commitment to high quality device hardware, it seems to me BlackBerry has a head-start in the Internet of Things.
For the past year, oncologists at the Memorial Sloan Kettering Cancer Centre in New York have been training IBM’s Watson – the artificial intelligence tour-de-force that beat allcomers on Jeopardy – to help personalise cancer care. The Centre explains that "combining [their] expertise with the analytical speed of IBM Watson, the tool has the potential to transform how doctors provide individualized cancer treatment plans and to help improve patient outcomes". Others are speculating already that Watson could "soon be the best doctor in the world".
I have no doubt that when Watson and things like it are available online to doctors worldwide, we will see overall improvements in healthcare outcomes, especially in parts of the world now under-serviced by medical specialists [having said that, the value of diagnosing cancer in poor developing nations is questionable if they cannot go on to treat it]. As with Google's self-driving car, we will probably get significant gains eventually, averaged across the population, from replacing humans with machines. Yet some of the foibles of computing are not well known and I think they will lead to surprises.
For all the wondrous gains made in Artificial Intelligence, where Watson now is the state-of-the art, A.I. remains algorithmic, and for that, it has inherent limitations that don't get enough attention. Computer scientists and mathematicians have know for generations that some surprisingly straightforward problems have no algorithmic solution. That is, some tasks cannot be accomplished by any universal step-by-step codified procedure. Examples include the Halting Problem and the Travelling Salesperson Problem. If these simple challenges have no algorithm, we need be more sober in our expectations of computerised intelligence.
A key limitation of any programmed algorithm is that it must make its decisions using a fixed set of inputs that are known and fully characterised (by the programmer) at design time. If you spring an unexpected input on any computer, it can fail, and yet that's what life is all about -- surprises. No mathematician seriously claims that what humans do is somehow magic; most believe we are computers made of meat. Nevertheless, when paradoxes like the Halting Problem abound, we can be sure that computing and cognition are not what they seem. We should hope these conundrums are better understood before putting too much faith in computers doing deep human work.
And yet, predictably, futurists are jumping ahead to imagine "Watson apps" in which patients access the supercomputer for themselves. Even if there were reliable algorithms for doctoring, I reckon the "Watson app" is a giant step, because of the complex way the patient's conditions are assessed and data is gathered for the diagnosis. That is, the taking of the medical history.
In these days of billion dollar investments in electronic health records (EHRs), we tend to think that medical decisions are all about the data. When politicians announce EHR programs they often boast that patients won't have to go through the rigmarole of giving their history over and over again to multiple doctors as they move through an episode of care. This is actually a serious misunderstanding of the importance in clinical decision-making of the interaction between medico and patient when the history is taken. It's subtle. The things a patient chooses to tell, the things they seem to be hiding, and the questions that make them anxious, all guide an experienced medico when taking a history, and provide extra cues (metadata if you will) about the patient’s condition.
Now, Watson may well have the ability to navigate this complexity and conduct a very sophisticated Q&A. It will certainly have a vastly bigger and more reliable memory of cases than any doctor, and with that it can steer a dynamic patient questionnaire. But will Watson be good enough to be made available direct to patients through an app, with no expert human mediation? Or will a host of new input errors result from patients typing their answers into a smart phone or speaking into a microphone, without any face-to-face subtlety (let alone human warmth)? It was true of mainframes and it’s just as true of the best A.I.: Bulldust in, bulldust out.
Finally, Watson's existing linguistic limitations are not to be underestimated. It is surely not trivial that Watson struggles with puns and humour. Futurist Mark Pesce when discussing Watson remarked in passing that scientists don’t understand the "quirks of language and intelligence" that create humour. The question of what makes us laugh does in fact occupy some of the finest minds in cognitive and social science. So we are a long way from being able to mechanise humour. And this matters because for the foreseeable future, it puts a great deal of social intercourse beyond AI's reach.
In between the extremes of laugh-out-loud comedy and a doctor’s dry written notes lies a spectrum of expressive subtleties, like a blush, an uncomfortable laugh, shame, and the humiliation that goes with some patients’ lived experience of illness. Watson may understand the English language, but does it understand people?
Watson can answer questions, but good doctors ask a lot of questions too. When will this amazing computer be able to hold the sort of two-way conversation that we would call a decent "bedside manner"?
Have a disruptive technology implementation story? Get recognised for your leadership. Apply for the 2014 SuperNova Awards for leaders in disruptive technology.
For the second time in as many months, a grave bug has emerged in core Internet security software. In February it was the "Goto Fail" bug in the Apple operating system iOS that left web site security inoperable; now we have "Heartbleed", a flaw that leaves many secure web servers in fact open to attackers sniffing memory contents looking for passwords and keys.
Who should care?
There is no shortage of advice on what to do if you're a user. And it's clear how to remediate the Heartbleed bug if you're a web administrator (a fix has been released). But what is the software fraternity going to do to reduce the incidence of these disastrous human errors? In my view, Goto Fail and Heartbleed are emblematic of chaotic software craftsmanship. It appears that goto statements are used with gay abandon throughout web software today, creating exactly the unmaintainable spaghetti code that the founders of Structured Programming warned us about in the 1970s. Testing is evidently lax; code inspection seems non-existent. The Heartbleed flaw is in a piece of widely used Open Source software, and was over-looked first by the programmer, and then by the designated inspector, and then it went unnoticed for two years in the wild.
What are the ramifications of Heartbleed?
"Heartbleed" is a flaw in an obscure low level feature of the "Transport Layer Security" (TLS) protocol. TLS has an optional feature dubbed "Heartbeat" which a computer connected in a secure session can use to periodically test if the other computer is still alive. Heartbeat involves sending a request message with some dummy payload, and getting back a response with duplicate payload. The bug in Heartbeat means the responding computer can be tricked into sending back a dump of 64 kiloytes of memory, because the payload length variable goes unchecked. (For the technically minded, this error is qualitatively similar to a buffer overload; see also the OpenSSL Project description of the bug). Being server memory used in security management, that random grab has a good chance of including sensitive TLS-related data like passwords, credit card numbers and even TLS session keys. The bug is confined to the OpenSSL security library, where it was introduced inadvertently as part of some TLS improvements in late 2011.
The flawed code is present in almost all Open Source web servers, or around 66% of all web servers worldwide. However not all servers on the Internet run SSL/TLS secure sessions. Security experts Netcraft run automatic surveys and have worked out that around 17% of all Internet sites would be affected by Heartbleed – or around half a million widely used addresses. These include many banks, financial services, government services, social media companies and employer extranets. An added complication is that the Heartbeat feature leaves no audit trail, and so a Heartbleed exploit is undetectable.
If you visit an affected site and start a secure ("padlocked") session, then an attacker that knows about Heartbleed can grab random pieces of memory from your session. Researchers have demonstrated that session keys can be retrieved, although it is said to be difficult. Nevertheless, Heartbleed has been described by some of the most respected and prudent commentators as catastrophic. Bruce Schneier for one rates its seriousness as "11 out of 10".
Should we panic?
No. The first rule in any emergency is "Don't Panic". But nevertheless, this is an emergency.
The risk of any individual having been harmed through Heartbleed is probably low, but the consequences are potentially grave (if for example your bank is affected). And soon enough, it will be simple and cheap to take action, so you will hear experts say 'it is prudent to assume you have been compromised' and to change your passwords.
However, you need to wait rather than rush into premature action. Until the websites you use have been fixed, changing passwords now may leave you more vulnerable, because it's highly likely that criminals are trying to exploit Heartbleed while they can. It's best to avoid using any secure websites for the time being. We should redouble the usual Internet precautions: check your credit card and bank statements (but not online for the time being!). Stay extra alert to suspicious looking emails not just from strangers but from your friends and colleagues too, for their cloud mail accounts might have been hacked. And seek out the latest news from your e-commerce sites, banks, government and so on. The Australian banks for instance were relatively quick to act; by April 10 the five biggest institutions confirmed they were safe.
Heartbleed for me is the last straw. I call it pathetic that mission critical code can harbour flaws like this. So for a start, in the interests of clarity, I will no longer use the term "Software Engineering". I've written a lot in the past about the practice and the nascent profession of programming but it seems we're just going backwards. I'm aware that calling programming a "craft" will annoy some people; honestly, I mean no offence to basket weavers.
I'm no merchant of doom. I'm not going to stop banking and shopping online (however I do refuse Internet facing Electronic Health Records, and I would not use a self-drive car). My focus is on software development processes and system security.
The modern world is increasingly dependent on software, so it passes understanding we still tolerate such ad hoc development processes.
The programmer responsible for the Heartbleed bug has explained that he made a number of changes to the code and that he "missed validating a variable" (referring to the unchecked length of the Heartbeat payload). The designated reviewer of the OpenSSL changes also missed that the length was not validated. The software was released into the wild in March 2012. It went unnoticed (well, unreported) until a few weeks ago and was rectified in an OpenSSL release on April 7.
I'd like to avoid apportioning individual blame, so I am not interested in the names of the programmer and the reviewer. But we have to ask: when so many application security issues boil down to overflow problems, why is it not second nature to watch out for bugs like Heartbleed? How did experienced programmers make such an error? Why was this flaw out in the wild for two years before it was spotted? I thought one of the core precepts of Open Source Software was that having many eyes looking over the code means that errors will be picked up. But code inspection seems not to be widely practiced anymore. There's not much point having open software if people aren't actually looking!
As an aside, criminal hackers know all about overflow errors and might be more motivated to find them than innocent developers. I fear that the Heartbleed overflow bug could have been noticed very quickly by hackers who pore over new releases looking for exactly this type of exploit, or equally by the NSA which is reported to have known about it from the beginning.
Where does this leave systems integrators and enterprise developers? Have they become accustomed to taking Open Source Software modules and building them in, without a whole lot of regression testing? There's a lot to be said for Free and Open Source Software (FOSS) but no enterprise can take "free" too literally; the total cost of development has to include reasonable specification, verification and testing of the integrated whole.
As discussed in the wake of Goto Fail, we need to urgently and radically lift coding standards.
'The widely publicised and very serious "gotofail" bug in iOS7 took me back ...
Early in my career I spent seven years in a very special software development environment. I didn't know it at the time, but this experience set the scene for much of my understanding of information security two decades later. I was in a team with a rigorous software development lifecycle; we attained ISO 9001 certification way back in 1998. My company deployed 30 software engineers in product development, 10 of whom were dedicated to testing. Other programmers elsewhere independently wrote manufacture test systems. We spent a lot of time researching leading edge development methodologies, such as Cleanroom, and formal specification languages like Z.
We wrote our own real time multi-tasking operating system; we even wrote our own C compiler and device drivers! Literally every single bit of the executable code was under our control. "Anal" doesn't even begin to describe our corporate culture.
Why all the fuss? Because at Telectronics Pacing Systems, over 1986-1990, we wrote the code for the world's first software controlled implantable defibrillator, the Guardian 4210.
The team spent relatively little time actually coding; we were mostly occupied writing and reviewing documents. And then there were the code inspections. We walked through pseudo-code during spec reviews, and source code during unit validation. And before finally shipping the product, we inspected the entire 40,000 lines of source code. That exercise took five people two months.
For critical modules, like the kernel and error correction routines, we walked through the compiled assembly code. We took the time to simulate the step-by-step operation of the machine code using pen and paper, each team member role-playing parts of the microprocessor (Phil would pretend to be the accumulator, Lou the program counter, me the index register). By the end of it all, we had several people who knew the defib's software like the back of their hand.
And we had demonstrably the most reliable real time software ever written. After amassing several thousand implant-years, we measured a bug rate of less than one in 10,000 lines.
The implant software team had a deserved reputation as pedants. Over 25 person years, the average rate of production was one line of debugged C per team member per day. We were painstaking, perfectionist, purist. And argumentative! Some of our debates were excruciating to behold. We fought over definitions of “verification” and “validation”; we disputed algorithms and test tools, languages and coding standards. We were even precious about code layout, which seemed to some pretty silly at the time.
Yet 20 years later, purists are looking good.
Last week saw widespread attention to a bug in Apple's iOS operating system which rendered website security impotent. The problem arose from a single superfluous line of code – an extra goto statement – that nullified checking of SSL connections, leaving users totally vulnerable to fake websites. The Twitterverse nicknamed the flaw #gotofail.
There are all sorts of interesting quality control questions in the #gotofail experience.
- Was the code inspected? Do companies even do code inspections these days?
- The extra goto was said to be a recent change to the source; if that's the case, what regression testing was performed on the change?
- How are test cases selected?
- For something as important as SSL, are there not test rigs with simulated rogue websites to stress test security systems before release?
There seems to have been egregious shortcomings at every level : code design, code inspection, and testing.
A lot of attention is being given to the code layout. The spurious goto is indented in such a way that it appears to be part of a branch, but it is not. If curly braces were used religiously, or if an automatic indenting tool was applied, then the bug would have been more obvious (assuming that the code gets inspected). I agree of course that layout and coding standards are important, but there is a much more robust way to make source code clearer.
Beyond the lax testing and quality control, there is also a software-theoretic question in all this that is getting hardly any attention: Why are programmers using ANY goto statements at all?
I was taught at college and later at Telectronics to avoid goto statements at all cost. Yes, on rare occasions a goto statement makes the code more compact, but with care, a program can almost always be structured to be compact in other ways. Don't programmers care anymore about elegance in logic design? Don't they make efforts to set out their code in a rigorous structured manner?
The conventional wisdom is that goto statements make source code harder to understand, harder to test and harder to maintain. Kernighan and Ritchie - UNIX pioneers and authors of the classic C programming textbook - said the goto statement is "infinitely abusable" and it "be used sparingly if at all." Before them, one of programming's giants, Edsger Dijkstra, wrote in 1968 that "The go to statement ... is too much an invitation to make a mess of one's program"; see Go To Statement Considered Harmful. The goto creates spaghetti code. The landmark structured programming language PASCAL doesn't even have a goto statement! At Telectronics our coding standard prohibited without exception gotos in all implantable software.
Hard to understand, hard to test and hard to maintain is exactly what we see in the flawed iOS7 code. The critical bug never would have happened if Apple too banned the goto.
Now, I am hardly going to suggest that fanatical coding standards and intellectual rigor are sufficient to make software secure (see also "Security Isn’t Secure). It's unlikely that many commercial developers will be able to cost-justify exhaustive code walkthroughs when millions of lines are involved even in the humble mobile phone. It’s not as if lives depend on commercial software.
Or do they?!
Let’s leave aside that vexed question for now and return to fundamentals.
The #gotofail episode will become a text book example of not just poor attention to detail, but moreover, the importance of disciplined logic, rigor, elegance, and fundamental coding theory.
A still deeper lesson in all this is the fragility of software. Prof Arie van Deursen nicely describes the iOS7 routine as "brittle". I want to suggest that all software is tragically fragile. It takes just one line of silly code to bring security to its knees. The sheer non-linearity of software – the ability for one line of software anywhere in a hundred million lines to have unbounded impact on the rest of the system – is what separates development from conventional engineering practice. Software doesn’t obey the laws of physics. No non-trivial software can ever be fully tested, and we have gone too far for the software we live with to be comprehensively proof read. We have yet to build the sorts of software tools and best practice and habits that would merit the title "engineering".
I’d like to close with a philosophical musing that might have appealed to my old mentors at Telectronics. Post-modernists today can rejoice that the real world has come to pivot precariously on pure text. It is weird and wonderful that technicians are arguing about the layout of source code – as if they are poetry critics.
We have come to depend daily on great obscure texts, drafted not by people we can truthfully call "engineers" but by a largely anarchic community we would be better of calling playwrights.
That is, information security is not intellectually secure. Almost every precept of orthodox information security is ready for a shake-up. Infosec practices are built on crumbling foundations.
UPDATE: I've been selected to speak on this topic at the 2014 AusCERT Conference - the biggest information security event in Australasia.
The recent tragic experience of data breaches -- at Target, Snapchat, Adobe Systems and RSA to name a very few -- shows that orthodox information security is simply not up to the task of securing serious digital assets. We have to face facts: no amount of today's conventional security is ever going to protect assets worth billions of dollars.
Our approach to infosec is based on old management process standards (which can be traced back to ISO 9000) and a ponderous technology neutrality that overly emphasises people and processes. The things we call "Information Security Management Systems" are actually not systems that any engineer would recognise but instead are flabby sets of documents and audit procedures.
"Continuous security improvement" in reality is continuous document engorgement.
Most ISMSs sit passively on shelves and share drives doing nothing for 12 months, until the next audit, when the papers become the centre of attention (not the actual security). Audit has become a sick joke. ISO 27000 and PCI assessors have the nerve to tell us their work only provides a snapshot, and if a breach occurs between visits, it's not their fault. In their words they admit therefore that audits do not predict performance between audits. While nobody is looking, our credit card numbers are about as secure as Schrodinger's Cat!
The deep problem is that computer systems have become so very complex and so very fragile that they are not manageable by traditional means. Our standard security tools, including Threat & Risk Assessment and hierarchical layered network design, are rooted in conventional engineering. Failure Modes & Criticality Analysis works well in linear systems, where small perturbations have small effects, but IT is utterly unlike this. The smallest most trivial omission in software or in a server configuration can have dire and unlimited consequences. It's like we're playing Jenga.
Update: Barely a month after I wrote this blog, we heard about the "goto fail" bug in the Apple iOS SSL routines, which resulted from one spurious line of code. It might have been more obvious to the programmer and/or any code reviewer had the code been indented differently or if curly braces were used rigorously.
Security needs to be re-thought from the ground up. We need some bigger ideas.
We need less rigid, less formulaic security management structures, to encourage people at the coal face to exercise their judgement and skill. We need straight talking CISOs with deep technical experience in how computers really work, and not 'suits' more focused on the C-suite than the dev teams. We have to stop writing impenetrable hierarchical security policies and SOPs (in the "waterfall" manner we recognised decades ago fails to do much good in software development). And we need to equate security with software quality and reliability, and demand that adequate time and resources be allowed for the detailed work to be done right.
If we can't protect credit card numbers today, we urgently need to do things differently, standing as we are on the brink of the Internet of Things.
I am speaking at next week's AusCERT security conference, on how to make privacy real for technologists. This is an edited version of my conference abstract.
Privacy by Design is a concept founded by the Ontario Privacy Commissioner Dr. Ann Cavoukian. Dubbed "PbD", it's basically the same good idea as designing in quality, or designing in security. It has caught on nicely as a mantra for privacy advocates worldwide. The trouble is, few designers or security professionals can tell what it means.
Privacy continues to be a bit of a jungle for security practitioners. It's not that they're uninterested in privacy; rather, it's rare for privacy objectives to be expressed in ways they can relate to. Only one of the 10 or 11 or more privacy principles we have in Australia is ever labelled "security" and even then, all it will say is security must be "reasonable" given the sensitivity of the Personal Information concerned. With this legalistic language, privacy is somewhat opaque to the engineering mind; security professionals naturally see it as meaning little more than encryption and maybe some access control.
To elevate privacy practice from the personal plane to the professional, we need to frame privacy objectives in a way that generates achievable design requirements. This presentation will showcase a new methodology to do this, by extending the familiar standardised Threat & Risk Assessment (TRA). A hybrid Privacy & Security TRA adds extra dimensions to the information asset inventory. Classically an information asset inventory accounts for the confidentiality, integrity and availability (C.I.A.) of each asset; the extended methodology goes further, to identify which assets represent Personal Information, and for those assets, lists privacy related attributes like consent status, accessibility and transparency. The methodology also broadens the customary set of threats to include over-collection, unconsented disclosure, incomplete responses to access requests, over-retention and so on.
The extended TRA methodology brings security and privacy practices closer together, giving real meaning to the goal of Privacy by Design. Privacy and security are sometimes thought to be in conflict, and indeed they often are. We should not sugar coat this; after all, systems designers are of course well accustomed to tensions between competing design objectives. To do a better job at privacy, security practitioners need new tools like the Security & Privacy TRA to surface the requirements in an actionable way.
The hybrid Threat & Risk Assessment
TRAs are widely practiced during requirements analysis stages of large information systems projects. There are a number of standards that guide the conduct of TRAs, such as ISO 31000. A TRA first catalogues all information assets controlled by the system, and then systematically explores all foreseeable adverse events that threaten those assets. Relative risk is then gauged, usually as a product of threat likelihood and severity, and the set of threats to be prioritised according to importance. Threat mitigations are then considered and the expected residual risks calculated. An especially good thing about a formal TRA is that it presents management with the risk profile to be expected after the security program is implemented, and fosters consciousness of the reality that finite risks always remain.
The diagram below illustrates a conventional TRA workflow (yellow), plus the extensions to cover privacy design (red). The important privacy qualities of Personal Information assets include Accessibility, Permissibility (to disclose), Sensitivity (of e.g. health information), Transparency (of the reasons for collection) and Quality. Typical threats to privacy include over-collection (which can be an adverse consequence of excessive event logging or diagnostics), over-disclosure, incompleteness of records furnished in response to access requests, and over-retention of PI beyond the prima facie business requirement. When it comes to mitigating privacy threats, security practitioners may be pleasantly surprised to find that most of their building blocks are applicable.
The hybrid Security-Privacy Threat & Risk Assessment will help ICT practitioners put Privacy by Design into practice. It helps reduce privacy principles to information systems engineering requirements, and surfaces potential tensions between security practices and privacy. ICT design frequently deals with competing requirements. When engineers have the right tools, they can deal properly with privacy.
I have come to believe that a systemic conceptual shortfall affects typical technologists' thinking about privacy. It may be that engineers tend to take literally the well-meaning slogan that "privacy is not a technology issue". I say this in all seriousness.
Online, we're talking about data privacy, or data protection, but systems designers tend to bring to work a spectrum of personal outlooks about privacy in the human sphere. Yet what matters is the precise wording of data privacy law, like Australia's Privacy Act. To illustrate the difference, here's the sort of experience I've had time and time again.
During the course of conducting a PIA in 2011, I spent time with the development team working on a new government database. These were good, senior people, with sophisticated understanding of information architecture. But they harboured restrictive views about privacy. An important clue was the way they referred to "private" information rather than Personal Information (or equivalently, Personally Identifiable Information, PII). After explaining that Personal Information is the operable term in Australian legislation, and reviewing its definition from the Privacy Act, we found that the team had failed to appreciate the extent of the PI in their system. They overlooked that most of their audit logs collect PI, albeit indirectly and automatically. Further, they had not appreciated that information about clients in their register provided by third parties was also PI (despite it being intuitively "less private" by virtue of originating from others). I attributed these blind spots to the developers' weak and informal frame of "private" information. Online and in data privacy law alike, things are very crisp. The definition of Personal Information -- namely any data relating to an individual whose identity is readily apparent -- sets a low bar, embracing a great many data classes and, by extension, informatics processes. It's a nice analytical definition that is readily factored into systems analysis. After the team got that, the PIA in question proceeded apace and we found and rectified several privacy risks that had gone unnoticed.
Here are some more of the many recurring misconceptions I've noticed over the past decade:
- "Personal" Information is sometimes taken to mean especially delicate information such as payment card details, rather than any information pertaining to an identifiable individual such as email addresses in many cases; an exchange between US data breach analyst Jake Kouns and me over the Epsilon incident in 2011 is revealing of a technologists' systemically narrow idea of PII;
- the act of collecting PI is sometimes regarded only in relation to direct collection from the individual concerned; technologists can overlook that PI provided by a third party to a data custodian is nevertheless being collected by the custodian, and they can fail to appreciate that generating PI internally, through event logging for instance, can also represent collection
- even if they are aware of points such as Australia's Access and Correction Principle, database administrators can be unaware that, technically, individuals requesting a copy of information held about them should also be provided with pertinent event logs; a non-trivial case where individuals can have a genuine interest in reviewing event logs is when they want to know if an organisation's staff have been accessing their records.
These instances, among many others in my experience working across both information security and privacy, show that ICT practitioners suffer important gaps in their understanding. Security professionals in particular may be forgiven for thinking that most legislated Privacy Principles are legal niceties irrelevant to them, for generally only one of the principles in any given set is overtly about security; see:
- no. 5 of the eight OECD Privacy Principles
- no. 4 of the five Fair Information Practice Principles in the US
- no. 8 of the ten Generally Accepted Privacy Principles of the US and Canadian accounting bodies,
- no. 4 of the ten old National Privacy Principles of Australia, and
- no. 11 of the 13 new Australian Privacy Principles (APPs).
Yet every one of the privacy principles is impacted by information technology and security practices; see Mapping Privacy requirements onto the IT function, Privacy Law & Policy Reporter, Vol. 10.1& 10.2, 2003. I believe the gaps in the privacy knowledge of ICT practitioners are not random but are systemic, probably resulting from privacy training for non-privacy professionals being ad hoc and not properly integrated with their particular world views.
To properly deal with data privacy, ICT practitioners need to have privacy framed in a way that leads to objective design requirements. Luckily there already exist several unifying frameworks for systematising the work of dev teams. One example that resonates strongly with data privacy practice is the Threat & Risk Assessment (TRA).
The TRA is an infosec requirements analysis tool, widely practiced in the public and private sectors. There are a number of standards that guide the conduct of TRAs, such as ISO 31000. A TRA is used to systematically catalogue all foreseeable adverse events that threaten an organisation's information assets, identify candidate security controls (considering technologies, processes and personnel) to mitigate those threats, and most importantly, determine how much should be invested in each control to bring all risks down to an acceptable level. The TRA process delivers real world management decisions, understanding that non zero risks are ever present, and that no organisation has an unlimited security budget.
I have found that in practice, the TRA exercise is readily extensible as an aid to Privacy by Design. A TRA can expressly incorporate privacy as an attribute of information assets worth protecting, alongside the conventional security qualities of confidentiality, integrity and availability ("C.I.A."). A crucial subtlety here is that privacy is not the same as confidentiality, yet many frequently conflate the two. A fuller understanding of privacy leads designers to consider the Collection, Use, Disclosure and Access & Correction principles, over and above confidentiality when they analyse information assets.
Lockstep continues to actively research the closer integration of security and privacy practices.