'The widely publicised and very serious "gotofail" bug in iOS7 took me back ...
Early in my career I spent seven years in a very special software development environment. I didn't know it at the time, but this experience set the scene for much of my understanding of information security two decades later. I was in a team with a rigorous software development lifecycle; we attained ISO 9001 certification way back in 1998. My company deployed 30 software engineers in product development, 10 of whom were dedicated to testing. Other programmers elsewhere independently wrote manufacture test systems. We spent a lot of time researching leading edge development methodologies, such as Cleanroom, and formal specification languages like Z.
We wrote our own real time multi-tasking operating system; we even wrote our own C compiler and device drivers! Literally every single bit of the executable code was under our control. "Anal" doesn't even begin to describe our corporate culture.
Why all the fuss? Because at Telectronics Pacing Systems, over 1986-1990, we wrote the code for the world's first software controlled implantable defibrillator, the Guardian 4210.
The team spent relatively little time actually coding; we were mostly occupied writing and reviewing documents. And then there were the code inspections. We walked through pseudo-code during spec reviews, and source code during unit validation. And before finally shipping the product, we inspected the entire 40,000 lines of source code. That exercise took five people two months.
For critical modules, like the kernel and error correction routines, we walked through the compiled assembly code. We took the time to simulate the step-by-step operation of the machine code using pen and paper, each team member role-playing parts of the microprocessor (Phil would pretend to be the accumulator, Lou the program counter, me the index register). By the end of it all, we had several people who knew the defib's software like the back of their hand.
And we had demonstrably the most reliable real time software ever written. After amassing several thousand implant-years, we measured a bug rate of less than one in 10,000 lines.
The implant software team had a deserved reputation as pedants. Over 25 person years, the average rate of production was one line of debugged C per team member per day. We were painstaking, perfectionist, purist. And argumentative! Some of our debates were excruciating to behold. We fought over definitions of “verification” and “validation”; we disputed algorithms and test tools, languages and coding standards. We were even precious about code layout, which seemed to some pretty silly at the time.
Yet 20 years later, purists are looking good.
Last week saw widespread attention to a bug in Apple's iOS operating system which rendered website security impotent. The problem arose from a single superfluous line of code – an extra goto statement – that nullified checking of SSL connections, leaving users totally vulnerable to fake websites. The Twitterverse nicknamed the flaw #gotofail.
There are all sorts of interesting quality control questions in the #gotofail experience.
- Was the code inspected? Do companies even do code inspections these days?
- The extra goto was said to be a recent change to the source; if that's the case, what regression testing was performed on the change?
- How are test cases selected?
- For something as important as SSL, are there not test rigs with simulated rogue websites to stress test security systems before release?
There seems to have been egregious shortcomings at every level : code design, code inspection, and testing.
A lot of attention is being given to the code layout. The spurious goto is indented in such a way that it appears to be part of a branch, but it is not. If curly braces were used religiously, or if an automatic indenting tool was applied, then the bug would have been more obvious (assuming that the code gets inspected). I agree of course that layout and coding standards are important, but there is a much more robust way to make source code clearer.
Beyond the lax testing and quality control, there is also a software-theoretic question in all this that is getting hardly any attention: Why are programmers using ANY goto statements at all?
I was taught at college and later at Telectronics to avoid goto statements at all cost. Yes, on rare occasions a goto statement makes the code more compact, but with care, a program can almost always be structured to be compact in other ways. Don't programmers care anymore about elegance in logic design? Don't they make efforts to set out their code in a rigorous structured manner?
The conventional wisdom is that goto statements make source code harder to understand, harder to test and harder to maintain. Kernighan and Ritchie - UNIX pioneers and authors of the classic C programming textbook - said the goto statement is "infinitely abusable" and it "be used sparingly if at all." Before them, one of programming's giants, Edsger Dijkstra, wrote in 1968 that "The go to statement ... is too much an invitation to make a mess of one's program"; see Go To Statement Considered Harmful. The goto creates spaghetti code. The landmark structured programming language PASCAL doesn't even have a goto statement! At Telectronics our coding standard prohibited without exception gotos in all implantable software.
Hard to understand, hard to test and hard to maintain is exactly what we see in the flawed iOS7 code. The critical bug never would have happened if Apple too banned the goto.
Now, I am hardly going to suggest that fanatical coding standards and intellectual rigor are sufficient to make software secure (see also "Security Isn’t Secure). It's unlikely that many commercial developers will be able to cost-justify exhaustive code walkthroughs when millions of lines are involved even in the humble mobile phone. It’s not as if lives depend on commercial software.
Or do they?!
Let’s leave aside that vexed question for now and return to fundamentals.
The #gotofail episode will become a text book example of not just poor attention to detail, but moreover, the importance of disciplined logic, rigor, elegance, and fundamental coding theory.
A still deeper lesson in all this is the fragility of software. Prof Arie van Deursen nicely describes the iOS7 routine as "brittle". I want to suggest that all software is tragically fragile. It takes just one line of silly code to bring security to its knees. The sheer non-linearity of software – the ability for one line of software anywhere in a hundred million lines to have unbounded impact on the rest of the system – is what separates development from conventional engineering practice. Software doesn’t obey the laws of physics. No non-trivial software can ever be fully tested, and we have gone too far for the software we live with to be comprehensively proof read. We have yet to build the sorts of software tools and best practice and habits that would merit the title "engineering".
I’d like to close with a philosophical musing that might have appealed to my old mentors at Telectronics. Post-modernists today can rejoice that the real world has come to pivot precariously on pure text. It is weird and wonderful that technicians are arguing about the layout of source code – as if they are poetry critics.
We have come to depend daily on great obscure texts, drafted not by people we can truthfully call "engineers" but by a largely anarchic community we would be better of calling playwrights.
With a bunch of exciting new members joining up on the eve of the RSA Conference, the FIDO Alliance is going from strength to strength. And they've just published the first public review drafts of their core "universal authentication" protocols.
An update to my Constellation Research report on FIDO is now available. Here's a preview.
The Go-To standards alliance in protocols for modern identity management
The FIDO Alliance – for Fast IDentity Online – is a fresh, fast growing consortium of security vendors and end users working out a new suite of protocols and standards to connect authentication endpoints to services. With an unusual degree of clarity in this field, FIDO envisages simply "doing for authentication what Ethernet did for networking".
Launched in early 2013, the FIDO Alliance has already grown to nearly 100 members, amongst which are heavyweights like Google, Lenovo, MasterCard, Microsoft and PayPal as well as a couple of dozen biometrics vendors, many of the leading Identity and Access Management solutions and service providers and several global players in the smartcard supply chain.
FIDO is different. The typical hackneyed elevator pitch in Identity and Access Management promises to "fix the password crisis" – usually by changing the way business is done. Most IDAM initiatives unwittingly convert clear-cut technology problems into open-ended business transformation problems. In contrast, FIDO's mission is refreshingly clear cut: it seeks to make strong authentication interoperable between devices and servers. When users have activated FIDO-compliant endpoints, reliable fine-grained information about their client environment becomes readily discoverable by any servers, which can then make access control decisions, each according to its own security policy.
With its focus, pragmatism and critical mass, FIDO is justifiably today's go-to authentication standards effort.
In February 2014, the FIDO Alliance announced the release of its first two protocol drafts, and a clutch of new members including powerful players in financial services, the cloud and e-commerce. Constellation notes in particular the addition to the board of security leader RSA and another major payments card, Discover. And FIDO continues to strengthen its vital “Relying Party” (service provider) representation with the appearance of Aetna, Goldman Sachs, Netflix and Salesforce.com.
It's time we fixed the Authentication plumbing
In my view, the best thing about FIDO is that it is not about federated identity but instead it operates one layer down in what we call the digital identity stack. This might seem to run against the IDAM tide, but it's refreshing, and it may help the FIDO Alliance sidestep the quagmire of identity policy mapping and legal complexities. FIDO is not really about the vexed general issue of "identity" at all! Instead, it's about low level authentication protocols; that is, the plumbing.
The FIDO Alliance sets out its mission as follows:
- Change the nature of online authentication by:
- Developing technical specifications that define an open, scalable, interoperable set of mechanisms that reduce the reliance on passwords to authenticate users.
- Operating industry programs to help ensure successful worldwide adoption of the Specifications.
- Submitting mature technical Specification(s) to recognized standards development organization(s) for formal standardization.
The engineering problem underlying Federated Identity is actually pretty simple: if we want to have a choice of high-grade physical, multi-factor "keys" used to access remote services, how do we convey reliable cues to those services about the type of key being used and the individual who's said to be using it? If we can solve that problem, then service providers and Relying Parties can sort out for themselves precisely what they need to know about the users, sufficient to identify and authenticate them.
All of these leaves the 'I' in the acronym "FIDO" a little contradictory. It's such a cute name (alluding of course to the Internet dog) that it's unlikely to change. Instead, I overheard that the acronym might go the way of "KFC" where eventually it is no longer spelled out and just becomes a word in and of itself.
FIDO Alliance Board Members
- CrucialTec (manufactures innovative user input devices for mobiles)
- Discover Card
- Nok Nok Labs (a specialist authentication server software company)
- NXP Semiconductors (a global supplier of card chips, SIMs and Secure Elements)
- Oberthur Technologies (a multinational smartcard and mobility solutions provider)
- Synaptics (fingerprint biometrics)
- Yubico (the developer of the YubiKey PKI enabled 2FA token).
FIDO Alliance Board Sponsor Level Members
- EyeLock Inc.
- Fingerprint Cards AB
- Goldman Sachs
- IDEX ASA
- Next Biometrics Group
- Oesterreichische Staatsdruckerei GmbH
- Ping Identity
- Wave Systems
Stay tuned for the updated Constellation Research report.
If anonymity is important, what is the legal basis for defending it?
I find that conventional data privacy law in most places around the world already protects anonymity, insofar as the act of de-anonymization represents an act of PII Collection - the creation of a named record. As such, de-anonymization cannot be lawfully performed without an express need to to do, or consent.
Cynics have been asking the same rhetorical question "is privacy dead?" for at least 40 years. Certainly information technology and ubiquitous connectivity have made it nearly impossible to hide, and so anonymity is critically ill. But privacy is not the same thing as secrecy; privacy is a state where those who know us, respect the knowledge they have about us. Privacy generally doesn't require us hiding from anyone; it requires restraint on the part of those who hold Personal Information about us.
The typical public response to data breaches, government surveillance and invasions like social media facial recognition is vociferous. People in general energetically assert their rights to not be tracked online, or to have their personal information exploited behind their backs. These reactions show that the idea of privacy alive and well.
The end of anonymity perhaps
Against a backdrop of spying revelations and excesses by social media companies especially in regards to facial recognition, there have been recent calls for a "new jurisprudence of anonymity"; see Yale law professor Jed Rubenfeld writing in the Washington Post of 13 Jan 2014. I wonder if there is another way to crack the nut? Because any new jurisprudence is going to take a very long time.
Instead, I suggest we leverage the way most international privacy law and privacy experience -- going back decades -- is technology neutral with regards to the method of collection. In some jurisdictions like Australia, the term "collection" is not even defined in privacy law. Instead, the law just uses the normal plain English sense of the word, when it frames principles like Collection Limitation: basically, you are not allowed to collect (by any means) Personally Identifiable Information without a good reasonable express reason. It means that if PII gets into a data system, the system is accountable under privacy law for that PII, no matter how it got there.
This technology neutral view of PII collection has satisfying ramifications for all the people who intuit that Big Data has got too "creepy". We can argue that if a named record is produced afresh by a Big Data process (especially if that record is produced without the named person being aware of it, and from raw data that was originally collected for some other purpose) then that record has logically been collected. Whether PII is collected directly, or collected indirectly, or is in fact created by an obscure process, privacy law is largely agnostic.
Prof Rubenfeld wrote:
- "The NSA program isn’t really about gathering data. It's about mining data. All the data are there already, digitally stored and collected by telecom giants, just waiting." [italics in original]
I suggest that the output of the data mining, if it is personally identifiable and especially if it has been rendered identifiable by processing previously anonymous raw data, has is a fresh collection by the mining operation. As such, the miners should be accountable for their newly minted PII, just as though they had collected gathered it directly from the persons concerned.
For now, I don't want to go further and argue the rights and wrongs of surveillance. I just want to show a new way to frame the privacy questions in surveillance and big data, making use of existing jurisprudence. If I am right and the NSA is in effect collecting PII as it goes about its data mining, then that provides a possibly fresh understanding of what's going on, within which we can objectively analyse the rights and wrongs.
I am actually the first to admit that within this frame, the NSA might still be justified in mining data, and there might be no actual technical breach of information privacy law, if for instance the NSA enjoys a law enforcement exemption. These are important questions that need to be debated, but elsewhere (see my recent blog on our preparedness to actually have such a debate). My purpose right now is to frame a way to defend anonymity using as much existing legal infrastructure as possible.
But Collection is not limited everywhere
There is an important legal-technical question in all this: Is the collection of PII actually regulated? In Europe, Australia, New Zealand and in dozens of countries, collection is limited, but in the USA, there is no general restriction against collecting PII. America has no broad data protection law, and in any case, the Fair Information Practice Principles (FIPPs) don't include a Collection Limitation principle.
So there may be few regulations in the USA that would carry my argument there! Nevertheless, surely we can use international jurisprudence in Collection Limitation instead of creating new American jurisprudence around anonymity?
So I'd like to put the following questions Jed Rubenfeld:
- Do technology neutral Collection Limitation Principles in theory provide a way to bring de-anonymised data into scope for data privacy laws? Is this a way to address peoples' concerns with Big Data?
- How does international jurisprudence around Collection Limitation translate to American schools of legal thought?
- Does this way of looking at the problem create new impetus for Collection Limitation to be introduced into American privacy principles, especially the FIPPs?
Appendix: "Applying Information Privacy Norms to Re-Identification"
In 2013 I presented some of these ideas to an online symposium at the Harvard Law School Petrie-Flom Center, on the Law, Ethics & Science of Re-identification Demonstrations. What follows is an extract from that presentation, in which I spell out carefully the argument -- which was not obvious to some at the time -- that when genetics researchers combine different data sets to demonstrate re-identification of donated genomic material, they are in effect collecting patient PII. I argue that this type of collection should be subject to ethics committee approval just as if the researchers were collecting the identities from the patients directly.
... I am aware of two distinct re-identification demonstrations that have raised awareness of the issues recently. In the first, Yaniv Erlich [at MIT's Whitehead Institute] used what I understand are new statistical techniques to re-identify a number of subjects that had donated genetic material anonymously to the 1000 Genomes project. He did this by correlating genes in the published anonymous samples with genes in named samples available from genealogical databases. The 1000 Genomes consent form reassured participants that re-identification would be "very hard". In the second notable demo, Latanya Sweeney re-identified volunteers in the Personal Genome Project using her previously published method of using a few demographic values (such as date or birth, sex and postal code) extracted from the otherwise anonymous records.
A great deal of the debate around these cases has focused on the consent forms and the research subjects’ expectations of anonymity. These are important matters for sure, yet for me the ethical issue in de-anonymisation demonstrations is more about the obligations of third parties doing the identification who had nothing to do with the original informed consent arrangements. The act of recording a person’s name against erstwhile anonymous data represents a collection of personal information. The implications for genomic data re-identification are clear.
Let’s consider Subject S who donates her DNA, ostensibly anonymously, to a Researcher R1, under some consent arrangement which concedes there is a possibility that S will be re-identified. And indeed, some time later, an independent researcher R2 does identify S and links her to the DNA sample. The fact is that R2 has collected personal information about S. If R2 has no relationship with S, then S has not consented to this new collection of her personal information.
Even if the consent form signed at the time of the original collection includes a disclaimer that absolute anonymity cannot be guaranteed, re-identifying the DNA sample later represents a new collection, one that has been undertaken without any consent. Given that S has no knowledge of R2, there can be no implied consent in her original understanding with R1, even if absolute anonymity was disclaimed.
Naturally the re-identification demonstrations have served a purpose. It is undoubtedly important that the limits of anonymity be properly understood, and the work of Yaniv and Latanya contribute to that. Nevertheless, these demonstrations were undertaken without the knowledge much less the consent of the individuals concerned. I contend that bioinformaticians using clever techniques to attach names to anonymous samples need ethics approval, just as they would if they were taking fresh samples from the people concerned.
See also my letter to the editor of Science magazine.