I've just completed a major new Constellation Research report looking at how today's privacy practices cope with Big Data. The report draws together my longstanding research on the counter-intuitive strengths of technology-neutral data protection laws, and melds it with my new Constellation colleagues' vast body of work in data analytics. The synergy is honestly exciting and illuminating.
Big Data promises tremendous benefits for a great many stakeholders but the potential gains are jeopardised by the excesses of a few. Some cavalier online businesses are propelled by a naive assumption that data in the "public domain" is up for grabs, and with that they often cross a line.
For example, there are apps and services now that will try to identify pictures you take of strangers in public, by matching them biometrically against data supersets compiled from social networking sites and other publically accessible databases. Many find such offerings quite creepy but they may be at a loss as to what to do about it, or even how to think through the issues objectively. Yet the very metaphor of data mining holds some of the clues. If, as some say, raw data is like crude oil, just waiting to be mined and exploited by enterprising prospecters, then surely there are limits, akin to mining permits?
Many think the law has not kept pace with technology, and that digital innovators are free to do what they like with any data they can get their hands on. But technologists repreatedly underestimate the strength of conventional data protection laws and regulations. The extraction of PII from raw data may be interpreted under technology neutral privacy principles as an act of Collection and as such is subject to existing statutes. Around the world, Google thus found they are not actually allowed to gather Personal Data that happens to be available in unencrypted Wi-Fi transmission as StreetView cars drive by homes and offices. And Facebook found they are not actually allowed to automatically identify people in photos through face recognition without consent. And Target probably would find, if they tried it outside the USA, that they cannot flag selected female customers as possibly pregnant by analysing their buying habits.
On the other hand, orthodox privacy policies and static user agreements do not cater for the way personal data can be conjured tomorrow from raw data collected today. Traditional privacy regimes require businesses to spell out what personally identifiable information (PII) they collect and why, and to restrict secondary usage. Yet with Big Data, with the best will in the world, a company might not know what data analytics will yield down the track. If mutual benefits for business and customer alike might be uncovered, a freeze-frame privacy arrangement may be counter-productive.
Thus the fit between data analytics and data privacy standards is complex and sometimes surprising. While existing laws are not to be underestimated, we do need something new. As far as I know it was Ray Wang in his Harvard Business Review blog who first called for a fresh privacy compact amongst users and businesses.
The spirit of data privacy is simply framed: organisations that know us should respect the knowledge they have, they should be open about what they know, and they should be restrained in what they do with it. In the Age of Big Data, let's have businesses respect the intelligence they extract from data mining, just as they should respect the knowledge they collect directly through forms and questionnaires.
I like the label "Big Privacy"; it is grandly optimistic, like "Big Data" itself, and at the same time implies a challenge to do better than regular privacy practices.
Ontario Privacy Commissioner Dr Ann Cavoukian writes about Big Privacy, describing it simply as "Privacy By Design writ large". But I think there must be more to it than that. Big Data is quantitatively but also qualitatively different from ordinary data analyis.
To summarise the basic elements of a Big Data compact:
- Respect and Restraint: In the face of Big Data’s temptations, remember that privacy is not only about what we do with PII; just as important is what we choose not to do.
- Super transparency: Who knows what lies ahead in Big Data? If data privacy means being open about what PII is collected and why, then advanced privacy means going further, telling people more about the business models and the sorts of results data mining is expected to return.
- Engage customers in a fair deal for PII: Information businesses ought to set out what PII is really worth to them (especially when it is extracted in non-obvious ways from raw data) and offer a fair "price" for it, whether in the form of "free" products and services, or explicit payment.
- Really innovate in privacy: There’s a common refrain that “privacy hampers innovation” but often that's an intellectually lazy cover for reserving the right to strip-mine PII. Real innovation lies in business practices which create and leverage PII while honoring privacy principles.
My report, "Big Privacy" Rises to the Challenges of Big Data may be downloaded from the Constellation Research website.
For the second time in as many months, a grave bug has emerged in core Internet security software. In February it was the "Goto Fail" bug in the Apple operating system iOS that left web site security inoperable; now we have "Heartbleed", a flaw that leaves many secure web servers in fact open to attackers sniffing memory contents looking for passwords and keys.
Who should care?
There is no shortage of advice on what to do if you're a user. And it's clear how to remediate the Heartbleed bug if you're a web administrator (a fix has been released). But what is the software fraternity going to do to reduce the incidence of these disastrous human errors? In my view, Goto Fail and Heartbleed are emblematic of chaotic software craftsmanship. It appears that goto statements are used with gay abandon throughout web software today, creating exactly the unmaintainable spaghetti code that the founders of Structured Programming warned us about in the 1970s. Testing is evidently lax; code inspection seems non-existent. The Heartbleed flaw is in a piece of widely used Open Source software, and was over-looked first by the programmer, and then by the designated inspector, and then it went unnoticed for two years in the wild.
What are the ramifications of Heartbleed?
"Heartbleed" is a flaw in an obscure low level feature of the "Transport Layer Security" (TLS) protocol. TLS has an optional feature dubbed "Heartbeat" which a computer connected in a secure session can use to periodically test if the other computer is still alive. Heartbeat involves sending a request message with some dummy payload, and getting back a response with duplicate payload. The bug in Heartbeat means the responding computer can be tricked into sending back a dump of 64 kiloytes of memory, because the payload length variable goes unchecked. (For the technically minded, this error is qualitatively similar to a buffer overload; see also the OpenSSL Project description of the bug). Being server memory used in security management, that random grab has a good chance of including sensitive TLS-related data like passwords, credit card numbers and even TLS session keys. The bug is confined to the OpenSSL security library, where it was introduced inadvertently as part of some TLS improvements in late 2011.
The flawed code is present in almost all Open Source web servers, or around 66% of all web servers worldwide. However not all servers on the Internet run SSL/TLS secure sessions. Security experts Netcraft run automatic surveys and have worked out that around 17% of all Internet sites would be affected by Heartbleed – or around half a million widely used addresses. These include many banks, financial services, government services, social media companies and employer extranets. An added complication is that the Heartbeat feature leaves no audit trail, and so a Heartbleed exploit is undetectable.
If you visit an affected site and start a secure ("padlocked") session, then an attacker that knows about Heartbleed can grab random pieces of memory from your session. Researchers have demonstrated that session keys can be retrieved, although it is said to be difficult. Nevertheless, Heartbleed has been described by some of the most respected and prudent commentators as catastrophic. Bruce Schneier for one rates its seriousness as "11 out of 10".
Should we panic?
No. The first rule in any emergency is "Don't Panic". But nevertheless, this is an emergency.
The risk of any individual having been harmed through Heartbleed is probably low, but the consequences are potentially grave (if for example your bank is affected). And soon enough, it will be simple and cheap to take action, so you will hear experts say 'it is prudent to assume you have been compromised' and to change your passwords.
However, you need to wait rather than rush into premature action. Until the websites you use have been fixed, changing passwords now may leave you more vulnerable, because it's highly likely that criminals are trying to exploit Heartbleed while they can. It's best to avoid using any secure websites for the time being. We should redouble the usual Internet precautions: check your credit card and bank statements (but not online for the time being!). Stay extra alert to suspicious looking emails not just from strangers but from your friends and colleagues too, for their cloud mail accounts might have been hacked. And seek out the latest news from your e-commerce sites, banks, government and so on. The Australian banks for instance were relatively quick to act; by April 10 the five biggest institutions confirmed they were safe.
Lessons for the Software Craft
Heartbleed for me is the last straw. I call it pathetic that mission critical code can harbour flaws like this. So for a start, in the interests of clarity, I will no longer use the term "Software Engineering". I've written a lot in the past about the practice and the nascent profession of programming but it seems we're just going backwards. I'm aware that calling programming a "craft" will annoy some people; honestly, I mean no offence to basket weavers.
I'm no merchant of doom. I'm not going to stop banking and shopping online (however I do refuse Internet facing Electronic Health Records, and I would not use a self-drive car). My focus is on software development processes and system security.
The modern world is increasingly dependent on software, so it passes understanding we still tolerate such ad hoc development processes.
The programmer responsible for the Heartbleed bug has explained that he made a number of changes to the code and that he "missed validating a variable" (referring to the unchecked length of the Heartbeat payload). The designated reviewer of the OpenSSL changes also missed that the length was not validated. The software was released into the wild in March 2012. It went unnoticed (well, unreported) until a few weeks ago and was rectified in an OpenSSL release on April 7.
I'd like to avoid apportioning individual blame, so I am not interested in the names of the programmer and the reviewer. But we have to ask: when so many application security issues boil down to overflow problems, why is it not second nature to watch out for bugs like Heartbleed? How did experienced programmers make such an error? Why was this flaw out in the wild for two years before it was spotted? I thought one of the core precepts of Open Source Software was that having many eyes looking over the code means that errors will be picked up. But code inspection seems not to be widely practiced anymore. There's not much point having open software if people aren't actually looking!
As an aside, criminal hackers know all about overflow errors and might be more motivated to find them than innocent developers. I fear that the Heartbleed overflow bug could have been noticed very quickly by hackers who pore over new releases looking for exactly this type of exploit, or equally by the NSA which is reported to have known about it from the beginning.
Where does this leave systems integrators and enterprise developers? Have they become accustomed to taking Open Source Software modules and building them in, without a whole lot of regression testing? There's a lot to be said for Free and Open Source Software (FOSS) but no enterprise can take "free" too literally; the total cost of development has to include reasonable specification, verification and testing of the integrated whole.
As discussed in the wake of Goto Fail, we need to urgently and radically lift coding standards.
This blog is an edited extract from an article of the same name, first published in the Journal of Internet Banking and Commerce, December 2012, vol. 17, no.3.
The credit card payments system is a paragon of standardisation. No other industry has such a strong history of driving and adopting uniform technologies, infrastructure and business processes. No matter where you keep a bank account, you can use a globally branded credit card to go shopping in almost every corner of the world. Seamless convenience is underpinned by the universal Four Party settlement model, and a long-standing card standard that works the same with ATMs and merchant terminals everywhere.
So with this determination to facilitate trustworthy and supremely convenient spending everywhere, it’s astonishing that the industry is still yet to standardise Internet payments. Most of the world has settled on the EMV standard for in-store transactions, but online we use a wide range of confusing and largely ineffective security measures. As a result, Card Not Present (CNP) fraud is growing unchecked. This article argues that all card payments should be properly secured using standardised hardware. In particular, CNP transactions should use the very same EMV chip and cryptography as do card present payments.
Skimming and Carding
With “carding”, criminals replicate stolen customer data on blank cards and use those card copies in regular merchant terminals. “Skimming” is one way of stealing card data, by running a card through a copying device when the customer isn’t looking (but it’s actually more common for card data to be stolen in bulk from compromised merchant and processor databases).
A magnetic stripe card stores the customer’s details as a string of ones and zeroes, and presents them to a POS terminal or ATM in the clear. It’s child’s play for criminals to scan the bits and copy them to a blank card.
The industry responded to skimming and carding with EMV (aka Chip-and-PIN). EMV replaces the magnetic storage with an integrated circuit, but more importantly, it secures the data transmitted from card to terminal. EMV works by first digitally signing those ones and zeros in the chip, and then verifying the signature at the terminal. The signing uses a Private Key unique to the cardholder and held safely inside the chip where it cannot be tampered with by fraudsters. It is not feasible to replicate the digital signature without having access to the inner workings of the chip, and thus EMV cards resist carding.
Online Card Fraud
Conventional Card Not Present (CNP) transactions are vulnerable because, a lot like the old mag stripe cards, they rest on clear text cardholder data. On its own, a merchant server cannot tell the difference between the original card data and a copy, just as a terminal cannot tell an original mag stripe card from a criminal's copy.
So CNP fraud is just online carding.
Despite the simplicity of the root problem, the past decade has seen a bewildering patchwork of flimsy and expensive online payments fixes. Various One Time Passwords have come and gone, from scratchy cards to electronic key fobs. Temporary SMS codes have been popular but were recently declared unsafe by the Communications Alliance in Australia, a policy body representing the major mobile carriers.
Meanwhile, extraordinary resources have been squandered on the novel “3D Secure” scheme (MasterCard “SecureCode” and “Verified by Visa”). 3D Secure take-up is piecemeal; it’s widely derided by merchants and customers alike. It is often blocked by browsers; and it throws up odd looking messages that can appear like a phishing attack or other malfunction. Moreover, it upsets the underlying Four Party settlements architecture, slowing transactions to a crawl and introducing untold legal complexities. Payments regulators too appear to have lost interest in 3D Secure.
So why doesn’t the card payments industry go back to its roots, preserve its global Four Party settlement architecture and standards, and tackle the real issue?
Kill two birds with one chip
We could stop most online fraud by using the same chip technologies we deployed to kill off skimming and carding.
It is technically simple to reproduce the familiar card-present user experience in a standard computer. It would just take the will of the financial services industry to make payments by smartcard standard. Computers with built-in smartcard readers have come and gone; they're commonplace in some Eastern European and Asian markets where smartcards are normal for e-health and online voting.
With dual interface and contactless smartcards, the interface options open right up. The Dell E series Latitudes have contactless card readers as standard (aimed at the US Personal ID Verification PIV market). But most mobile devices now feature NFC or “Near Field Communications”, a special purpose device-to-device networking capability, which until now has mostly been used to emulate a payment card. But NFC tablets and smartphones can switch into reader emulation mode, so as to act as a smartcard terminal. Other researchers have recently demonstrated how to read a smartcard via NFC to authenticate the cardholder to a mobile device.
As an alternative, the SIM or other "Secure Element" of most mobile devices could be used to digitally sign card transactions directly, in place of the card. That’s essentially how NFC payment apps works for Card Present transactions – but nobody has yet made the leap to use smart phone hardware security for Card Not Present.
Using a smart payment card with a computer could and should be as easy as using Paywave or Paypass.
Conclusion: Hardware security
All serious payments systems use hardware security. The classic examples include SIM cards, EMV, the Hardware Security Modules mandated by regulators in all ATMs, and the Secure Elements of NFC devices. With well designed hardware security, we gain a lasting upper hand in the criminal arms race.
The Internet and mobile channels will one day overtake the traditional physical payments medium. Indeed, commentators already like to say that the “digital economy” is simply the economy. Therefore, let us stop struggling with stopgap Internet security measures, and let us stop pretending that PCI-DSS audits will stop organised crime stealing card numbers by the million. Instead, we should kill two birds with one stone, and use chip technology to secure both card present and CNP transactions, to deliver the same high standards of usability and security in all channels.
In one of the most highly anticipated sessions ever at the annual South-by-Southwest (SXSW) culture festival, NSA whistle blower Ed Snowden appeared via live video link from Russia. He joined two privacy and security champions from the American Civil Liberties Union – Chris Soghoian and Ben Wizner – to canvass the vexed tensions between intelligence and law enforcement, personal freedom, government accountability and digital business models.
These guys traversed difficult ground, with respect and much nuance. They agreed the issues are tough, and that proper solutions are non-obvious and slow-coming. The transcript is available here.
Yet afterwards the headlines and tweet stream were dominated by "Snowden's Tips" for personal online security. It was as if Snowden had been conducting a self-help workshop or a Cryptoparty. He was reported to recommend we encrypt our hard drives, encrypt our communications, and use Tor (the special free-and-open-source encrypted browser). These are mostly fine suggestions but I am perplexed why they should be the main takeaways from a complex discussion. Are people listening to Snowdenis broader and more general policy lessons? I fear not. I believe people still conflate secrecy and privacy. At the macro level, the confusion makes it difficult to debate national security policy properly; at a micro level, even if crypto was practical for typical citizens, it is not a true privacy measure. Citizens need so much more than secrecy technologies, whether it's SSL-always-on at web sites, or do-it-yourself encryption.
Ed Snowden is a remarkably measured and thoughtful commentator on national security. Despite being hounded around the word, he is not given to sound bites. His principal concerns appear to be around public accountability, oversight and transparency. He speaks of the strengths and weaknesses of the governance systems already in place; he urges Congress to hold security agency heads to account.
When drawn on questions of technology, he doesn't dispense casual advice; instead he calls for multifaceted responses to our security dilemmas: more cryptological research, better random number generators, better testing, more robust cryptographic building blocks and more careful product design. Deep, complicated engineering stuff.
So how did the media, both mainstream and online alike, distill Snowden's sweeping analysis of politics, policy and engineering into three sterile and quasi-survivalist snippets?
Partly it's due to the good old sensationalism of all modern news media: everyone likes a David-and-Goliath angle where individuals face off against pitiless governments. And there's also the ruthless compression: newspapers cater for an audience with school-age reading levels and attention spans, and Twitter clips our contributions to 140 characters.
But there is also a deeper over-simplification of privacy going on which inhibits our progress.
Too often, people confuse privacy for secrecy. Privacy gets framed as a need to hide from prying eyes, and from that starting position, many advocates descend into a combative, everyone-for-themselves mindset.
However privacy has very little to do with secrecy. We shouldn't have to go underground to enjoy that fundamental human right to be let alone. The social reality is that most of us wish to lead rich and quite public lives. We actually want others to know us – to know what we do, what we like, and what we think – but all within limits. Digital privacy (or more clinically, data protection) is not about hiding; rather it is a state where those who know us are restrained in what they do with the knowledge they have about us.
Privacy is the protection you need when your affairs are not confidential!
So encryption is a sterile and very limited privacy measure. As the SXSW panellists agreed, today's encryption tools really are the preserve of deep technical specialists. Ben Wizner quipped that if the question is how can average users protect themselves online, and the answer is Tor, then "we have failed".
And the problems with cryptography are not just usability and customer experience. A fundamental challenge with the best encryption is that everyone needs to be running the tools. You cannot send out encrypted email unilaterally – you need to first make sure all your correspondents have installed the right software and they've got trusted copies of your encryption keys, or they won't be able to unscramble your messages.
Chris Soghoian also nailed the business problem that current digital revenue models are largely incompatible with encryption. The wondrous free services we enjoy from the Googles and Facebooks of the world are funded in the main by mining our data streams, figuring out our interests, habits and connections, and monetising that synthesised information. The web is in fact bankrolled by surveillance – by Big Business as opposed to government.
End-to-end encryption prevents data mining and would ruin the business model of the companies we've become attached to. If we were to get serious with encryption, we may have to cough up the true price for our modern digital lifestyles.
The SXSW privacy and security panellists know all this. Snowden in particular spent much of his time carefully reiterating many of the basics of data privacy. For instance he echoed the Collection Limitation Principle when he said of large companies that they "can't collect any data; [they] should only collect data and hold it for as long as necessary for the operation of the business". And the Openness Principle: "data should not be collected without people's knowledge and consent". If I was to summarise Snowden's SXSW presentation, I'd say privacy will only be improved by reforming the practices of both governments and big businesses, and by putting far more care into digital product development. Ed Snowden himself doesn't promote neat little technology tips.
It's still early days for the digital economy. We're experiencing an online re-run of the Wild West, with humble users understandably feeling forced to take measures into their own hands. So many individuals have become hungry for defensive online tools and tips. But privacy is more about politics and regulation than technology. I hope that people listen more closely to Ed Snowden on policy, and that his lasting legacy is more about legal reform and transparency than Do-It-Yourself encryption.
This is the abstract of a current privacy conference proposal.
Many Big Data and online businesses proceed on a naive assumption that data in the "public domain" is up for grabs; technocrats are often surprised that conventional data protection laws can be interpreted to cover the extraction of PII from raw data. On the other hand, orthodox privacy frameworks don't cater for the way PII can be created in future from raw data collected today. This presentation will bridge the conceptual gap between data analytics and privacy, and offer new dynamic consent models to civilize the trade in PII for goods and services.
It’s often said that technology has outpaced privacy law, yet by and large that's just not the case. Technology has certainly outpaced decency, with Big Data and biometrics in particular becoming increasingly invasive. However OECD data privacy principles set out over thirty years ago still serve us well. Outside the US, rights-based privacy law has proven effective against today's technocrats' most worrying business practices, based as they are on taking liberties with any data that comes their way. To borrow from Niels Bohr, technologists who are not surprised by data privacy have probably not understood it.
The cornerstone of data privacy in most places is the Collection Limitation principle, which holds that organizations should not collect Personally Identifiable Information beyond their express needs. It is the conceptual cousin of security's core Need-to-Know Principle, and the best starting point for Privacy-by-Design. The Collection Limitation principle is technology neutral and thus blind to the manner of collection. Whether PII is collected directly by questionnaire or indirectly via biometric facial recognition or data mining, data privacy laws apply.
'The widely publicised and very serious "gotofail" bug in iOS7 took me back ...
Early in my career I spent seven years in a very special software development environment. I didn't know it at the time, but this experience set the scene for much of my understanding of information security two decades later. I was in a team with a rigorous software development lifecycle; we attained ISO 9001 certification way back in 1998. My company deployed 30 software engineers in product development, 10 of whom were dedicated to testing. Other programmers elsewhere independently wrote manufacture test systems. We spent a lot of time researching leading edge development methodologies, such as Cleanroom, and formal specification languages like Z.
We wrote our own real time multi-tasking operating system; we even wrote our own C compiler and device drivers! Literally every single bit of the executable code was under our control. "Anal" doesn't even begin to describe our corporate culture.
Why all the fuss? Because at Telectronics Pacing Systems, over 1986-1990, we wrote the code for the world's first software controlled implantable defibrillator, the Guardian 4210.
The team spent relatively little time actually coding; we were mostly occupied writing and reviewing documents. And then there were the code inspections. We walked through pseudo-code during spec reviews, and source code during unit validation. And before finally shipping the product, we inspected the entire 40,000 lines of source code. That exercise took five people two months.
For critical modules, like the kernel and error correction routines, we walked through the compiled assembly code. We took the time to simulate the step-by-step operation of the machine code using pen and paper, each team member role-playing parts of the microprocessor (Phil would pretend to be the accumulator, Lou the program counter, me the index register). By the end of it all, we had several people who knew the defib's software like the back of their hand.
And we had demonstrably the most reliable real time software ever written. After amassing several thousand implant-years, we measured a bug rate of less than one in 10,000 lines.
The implant software team had a deserved reputation as pedants. Over 25 person years, the average rate of production was one line of debugged C per team member per day. We were painstaking, perfectionist, purist. And argumentative! Some of our debates were excruciating to behold. We fought over definitions of “verification” and “validation”; we disputed algorithms and test tools, languages and coding standards. We were even precious about code layout, which seemed to some pretty silly at the time.
Yet 20 years later, purists are looking good.
Last week saw widespread attention to a bug in Apple's iOS operating system which rendered website security impotent. The problem arose from a single superfluous line of code – an extra goto statement – that nullified checking of SSL connections, leaving users totally vulnerable to fake websites. The Twitterverse nicknamed the flaw #gotofail.
There are all sorts of interesting quality control questions in the #gotofail experience.
- Was the code inspected? Do companies even do code inspections these days?
- The extra goto was said to be a recent change to the source; if that's the case, what regression testing was performed on the change?
- How are test cases selected?
- For something as important as SSL, are there not test rigs with simulated rogue websites to stress test security systems before release?
There seems to have been egregious shortcomings at every level : code design, code inspection, and testing.
A lot of attention is being given to the code layout. The spurious goto is indented in such a way that it appears to be part of a branch, but it is not. If curly braces were used religiously, or if an automatic indenting tool was applied, then the bug would have been more obvious (assuming that the code gets inspected). I agree of course that layout and coding standards are important, but there is a much more robust way to make source code clearer.
Beyond the lax testing and quality control, there is also a software-theoretic question in all this that is getting hardly any attention: Why are programmers using ANY goto statements at all?
I was taught at college and later at Telectronics to avoid goto statements at all cost. Yes, on rare occasions a goto statement makes the code more compact, but with care, a program can almost always be structured to be compact in other ways. Don't programmers care anymore about elegance in logic design? Don't they make efforts to set out their code in a rigorous structured manner?
The conventional wisdom is that goto statements make source code harder to understand, harder to test and harder to maintain. Kernighan and Ritchie - UNIX pioneers and authors of the classic C programming textbook - said the goto statement is "infinitely abusable" and it "be used sparingly if at all." Before them, one of programming's giants, Edsger Djikstra, wrote in 1968 that "The go to statement ... is too much an invitation to make a mess of one's program"; see Go To Statement Considered Harmful. The goto creates spaghetti code. The landmark structured programming language PASCAL doesn't even have a goto statement! At Telectronics our coding standard prohibited without exception gotos in all implantable software.
Hard to understand, hard to test and hard to maintain is exactly what we see in the flawed iOS7 code. The critical bug never would have happened if Apple too banned the goto.
Now, I am hardly going to suggest that fanatical coding standards and intellectual rigor are sufficient to make software secure (see also "Security Isn’t Secure). It's unlikely that many commercial developers will be able to cost-justify exhaustive code walkthroughs when millions of lines are involved even in the humble mobile phone. It’s not as if lives depend on commercial software.
Or do they?!
Let’s leave aside that vexed question for now and return to fundamentals.
The #gotofail episode will become a text book example of not just poor attention to detail, but moreover, the importance of disciplined logic, rigor, elegance, and fundamental coding theory.
A still deeper lesson in all this is the fragility of software. Prof Arie van Deursen nicely describes the iOS7 routine as "brittle". I want to suggest that all software is tragically fragile. It takes just one line of silly code to bring security to its knees. The sheer non-linearity of software – the ability for one line of software anywhere in a hundred million lines to have unbounded impact on the rest of the system – is what separates development from conventional engineering practice. Software doesn’t obey the laws of physics. No non-trivial software can ever fully tested, and we have gone too far for the software we live with to be comprehensively proof read. We have yet to build the sorts of software tools and best practice and habits that would merit the title "engineering".
I’d like to close with a philosophical musing that might have appealed to my old mentors at Telectronics. We have reached a sort of pinnacle in post-modernism where the real world has come to pivot precariously on pure text. It is weird and wonderful that engineers are arguing about the layout of source code – as if they are poetry critics.
We have come to depend daily on great obscure texts, drafted not by people we can truthfully call "engineers" but by a largely anarchic community we would be better of calling playwrights.
With a bunch of exciting new members joining up on the eve of the RSA Conference, the FIDO Alliance is going from strength to strength. And they've just published the first public review drafts of their core "universal authentication" protocols.
An update to my Constellation Research report on FIDO is now available. Here's a preview.
The Go-To standards alliance in protocols for modern identity management
The FIDO Alliance – for Fast IDentity Online – is a fresh, fast growing consortium of security vendors and end users working out a new suite of protocols and standards to connect authentication endpoints to services. With an unusual degree of clarity in this field, FIDO envisages simply "doing for authentication what Ethernet did for networking".
Launched in early 2013, the FIDO Alliance has already grown to nearly 100 members, amongst which are heavyweights like Google, Lenovo, MasterCard, Microsoft and PayPal as well as a couple of dozen biometrics vendors, many of the leading Identity and Access Management solutions and service providers and several global players in the smartcard supply chain.
FIDO is different. The typical hackneyed elevator pitch in Identity and Access Management promises to "fix the password crisis" – usually by changing the way business is done. Most IDAM initiatives unwittingly convert clear-cut technology problems into open-ended business transformation problems. In contrast, FIDO's mission is refreshingly clear cut: it seeks to make strong authentication interoperable between devices and servers. When users have activated FIDO-compliant endpoints, reliable fine-grained information about their client environment becomes readily discoverable by any servers, which can then make access control decisions, each according to its own security policy.
With its focus, pragmatism and critical mass, FIDO is justifiably today's go-to authentication standards effort.
In February 2014, the FIDO Alliance announced the release of its first two protocol drafts, and a clutch of new members including powerful players in financial services, the cloud and e-commerce. Constellation notes in particular the addition to the board of security leader RSA and another major payments card, Discover. And FIDO continues to strengthen its vital “Relying Party” (service provider) representation with the appearance of Aetna, Goldman Sachs, Netflix and Salesforce.com.
It's time we fixed the Authentication plumbing
In my view, the best thing about FIDO is that it is not about federated identity but instead it operates one layer down in what we call the digital identity stack. This might seem to run against the IDAM tide, but it's refreshing, and it may help the FIDO Alliance sidestep the quagmire of identity policy mapping and legal complexities. FIDO is not really about the vexed general issue of "identity" at all! Instead, it's about low level authentication protocols; that is, the plumbing.
The FIDO Alliance sets out its mission as follows:
- Change the nature of online authentication by:
- Developing technical specifications that define an open, scalable, interoperable set of mechanisms that reduce the reliance on passwords to authenticate users.
- Operating industry programs to help ensure successful worldwide adoption of the Specifications.
- Submitting mature technical Specification(s) to recognized standards development organization(s) for formal standardization.
The engineering problem underlying Federated Identity is actually pretty simple: if we want to have a choice of high-grade physical, multi-factor "keys" used to access remote services, how do we convey reliable cues to those services about the type of key being used and the individual who's said to be using it? If we can solve that problem, then service providers and Relying Parties can sort out for themselves precisely what they need to know about the users, sufficient to identify and authenticate them.
All of these leaves the 'I' in the acronym "FIDO" a little contradictory. It's such a cute name (alluding of course to the Internet dog) that it's unlikely to change. Instead, I overheard that the acronym might go the way of "KFC" where eventually it is no longer spelled out and just becomes a word in and of itself.
FIDO Alliance Board Members
- CrucialTec (manufactures innovative user input devices for mobiles)
- Discover Card
- Nok Nok Labs (a specialist authentication server software company)
- NXP Semiconductors (a global supplier of card chips, SIMs and Secure Elements)
- Oberthur Technologies (a multinational smartcard and mobility solutions provider)
- Synaptics (fingerprint biometrics)
- Yubico (the developer of the YubiKey PKI enabled 2FA token).
FIDO Alliance Board Sponsor Level Members
- EyeLock Inc.
- Fingerprint Cards AB
- Goldman Sachs
- IDEX ASA
- Next Biometrics Group
- Oesterreichische Staatsdruckerei GmbH
- Ping Identity
- Wave Systems
Stay tuned for the updated Constellation Research report.
If anonymity is important, what is the legal basis for defending it?
I find that conventional data privacy law in most places around the world already protects anonymity, insofar as the act of de-anonymization represents an act of PII Collection - the creation of a named record. As such, de-anonymization cannot be lawfully performed without an express need to to do, or consent.
Cynics have been asking the same rhetorical question "is privacy dead?" for at least 40 years. Certainly information technology and ubiquitous connectivity have made it nearly impossible to hide, and so anonymity is critically ill. But privacy is not the same thing as secrecy; privacy is a state where those who know us, respect the knowledge they have about us. Privacy generally doesn't require us hiding from anyone; it requires restraint on the part of those who hold Personal Information about us.
The typical public response to data breaches, government surveillance and invasions like social media facial recognition is vociferous. People in general energetically assert their rights to not be tracked online, or to have their personal information exploited behind their backs. These reactions show that the idea of privacy alive and well.
The end of anonymity perhaps
Against a backdrop of spying revelations and excesses by social media companies especially in regards to facial recognition, there have been recent calls for a "new jurisprudence of anonymity"; see Yale law professor Jed Rubenfeld writing in the Washington Post of 13 Jan 2014. I wonder if there is another way to crack the nut? Because any new jurisprudence is going to take a very long time.
Instead, I suggest we leverage the way most international privacy law and privacy experience -- going back decades -- is technology neutral with regards to the method of collection. In some jurisdictions like Australia, the term "collection" is not even defined in privacy law. Instead, the law just uses the normal plain English sense of the word, when it frames principles like Collection Limitation: basically, you are not allowed to collect (by any means) Personally Identifiable Information without a good reasonable express reason. It means that if PII gets into a data system, the system is accountable under privacy law for that PII, no matter how it got there.
This technology neutral view of PII collection has satisfying ramifications for all the people who intuit that Big Data has got too "creepy". We can argue that if a named record is produced afresh by a Big Data process (especially if that record is produced without the named person being aware of it, and from raw data that was originally collected for some other purpose) then that record has logically been collected. Whether PII is collected directly, or collected indirectly, or is in fact created by an obscure process, privacy law is largely agnostic.
Prof Rubenfeld wrote:
- "The NSA program isn’t really about gathering data. It's about mining data. All the data are there already, digitally stored and collected by telecom giants, just waiting." [italics in original]
I suggest that the output of the data mining, if it is personally identifiable and especially if it has been rendered identifiable by processing previously anonymous raw data, has is a fresh collection by the mining operation. As such, the miners should be accountable for their newly minted PII, just as though they had collected gathered it directly from the persons concerned.
For now, I don't want to go further and argue the rights and wrongs of surveillance. I just want to show a new way to frame the privacy questions in surveillance and big data, making use of existing jurisprudence. If I am right and the NSA is in effect collecting PII as it goes about its data mining, then that provides a possibly fresh understanding of what's going on, within which we can objectively analyse the rights and wrongs.
I am actually the first to admit that within this frame, the NSA might still be justified in mining data, and there might be no actual technical breach of information privacy law, if for instance the NSA enjoys a law enforcement exemption. These are important questions that need to be debated, but elsewhere (see my recent blog on our preparedness to actually have such a debate). My purpose right now is to frame a way to defend anonymity using as much existing legal infrastructure as possible.
But Collection is not limited everywhere
There is an important legal-technical question in all this: Is the collection of PII actually regulated? In Europe, Australia, New Zealand and in dozens of countries, collection is limited, but in the USA, there is no general restriction against collecting PII. America has no broad data protection law, and in any case, the Fair Information Practice Principles (FIPPs) don't include a Collection Limitation principle.
So there may be few regulations in the USA that would carry my argument there! Nevertheless, surely we can use international jurisprudence in Collection Limitation instead of creating new American jurisprudence around anonymity?
So I'd like to put the following questions Jed Rubenfeld:
- Do technology neutral Collection Limitation Principles in theory provide a way to bring de-anonymised data into scope for data privacy laws? Is this a way to address peoples' concerns with Big Data?
- How does international jurisprudence around Collection Limitation translate to American schools of legal thought?
- Does this way of looking at the problem create new impetus for Collection Limitation to be introduced into American privacy principles, especially the FIPPs?
Appendix: "Applying Information Privacy Norms to Re-Identification"
In 2013 I presented some of these ideas to an online symposium at the Harvard Law School Petrie-Flom Center, on the Law, Ethics & Science of Re-identification Demonstrations. What follows is an extract from that presentation, in which I spell out carefully the argument -- which was not obvious to some at the time -- that when genetics researchers combine different data sets to demonstrate re-identification of donated genomic material, they are in effect collecting patient PII. I argue that this type of collection should be subject to ethics committee approval just as if the researchers were collecting the identities from the patients directly.
... I am aware of two distinct re-identification demonstrations that have raised awareness of the issues recently. In the first, Yaniv Erlich [at MIT's Whitehead Institute] used what I understand are new statistical techniques to re-identify a number of subjects that had donated genetic material anonymously to the 1000 Genomes project. He did this by correlating genes in the published anonymous samples with genes in named samples available from genealogical databases. The 1000 Genomes consent form reassured participants that re-identification would be "very hard". In the second notable demo, Latanya Sweeney re-identified volunteers in the Personal Genome Project using her previously published method of using a few demographic values (such as date or birth, sex and postal code) extracted from the otherwise anonymous records.
A great deal of the debate around these cases has focused on the consent forms and the research subjects’ expectations of anonymity. These are important matters for sure, yet for me the ethical issue in de-anonymisation demonstrations is more about the obligations of third parties doing the identification who had nothing to do with the original informed consent arrangements. The act of recording a person’s name against erstwhile anonymous data represents a collection of personal information. The implications for genomic data re-identification are clear.
Let’s consider Subject S who donates her DNA, ostensibly anonymously, to a Researcher R1, under some consent arrangement which concedes there is a possibility that S will be re-identified. And indeed, some time later, an independent researcher R2 does identify S and links her to the DNA sample. The fact is that R2 has collected personal information about S. If R2 has no relationship with S, then S has not consented to this new collection of her personal information.
Even if the consent form signed at the time of the original collection includes a disclaimer that absolute anonymity cannot be guaranteed, re-identifying the DNA sample later represents a new collection, one that has been undertaken without any consent. Given that S has no knowledge of R2, there can be no implied consent in her original understanding with R1, even if absolute anonymity was disclaimed.
Naturally the re-identification demonstrations have served a purpose. It is undoubtedly important that the limits of anonymity be properly understood, and the work of Yaniv and Latanya contribute to that. Nevertheless, these demonstrations were undertaken without the knowledge much less the consent of the individuals concerned. I contend that bioinformaticians using clever techniques to attach names to anonymous samples need ethics approval, just as they would if they were taking fresh samples from the people concerned.
See also my letter to the editor of Science magazine.
That is, information security is not intellectually secure. Almost every precept of orthodox information security is ready for a shake-up. Infosec practices are built on crumbling foundations.
UPDATE: I've been selected to speak on this topic at the 2014 AusCERT Conference - the biggest information security event in Australasia.
The recent tragic experience of data breaches -- at Target, Snapchat, Adobe Systems and RSA to name a very few -- shows that orthodox information security is simply not up to the task of securing serious digital assets. We have to face facts: no amount of today's conventional security is ever going to protect assets worth billions of dollars.
Our approach to infosec is based on old management process standards (which can be traced back to ISO 9000) and a ponderous technology neutrality that overly emphasises people and processes. The things we call "Information Security Management Systems" are actually not systems that any engineer would recognise but instead are flabby sets of documents and audit procedures.
"Continuous security improvement" in reality is continuous document engorgement.
Most ISMSs sit passively on shelves and share drives doing nothing for 12 months, until the next audit, when the papers become the centre of attention (not the actual security). Audit has become a sick joke. ISO 27000 and PCI assessors have the nerve to tell us their work only provides a snapshot, and if a breach occurs between visits, it's not their fault. In their words they admit therefore that audits do not predict performance between audits. While nobody is looking, our credit card numbers are about as secure as Schrodinger's Cat!
The deep problem is that computer systems have become so very complex and so very fragile that they are not manageable by traditional means. Our standard security tools, including Threat & Risk Assessment and hierarchical layered network design, are rooted in conventional engineering. Failure Modes & Criticality Analysis works well in linear systems, where small perturbations have small effects, but IT is utterly unlike this. The smallest most trivial omission in software or in a server configuration can have dire and unlimited consequences. It's like we're playing Jenga.
Update: Barely a month after I wrote this blog, we heard about the "goto fail" bug in the Apple iOS SSL routines, which resulted from one spurious line of code. It might have been more obvious to the programmer and/or any code reviewer had the code been indented differently or if curly braces were used rigorously.
Security needs to be re-thought from the ground up. We need some bigger ideas.
We need less rigid, less formulaic security management structures, to encourage people at the coal face to exercise their judgement and skill. We need straight talking CISOs with deep technical experience in how computers really work, and not 'suits' more focused on the C-suite than the dev teams. We have to stop writing impenetrable hierarchical security policies and SOPs (in the "waterfall" manner we recognised decades ago fails to do much good in software development). And we need to equate security with software quality and reliability, and demand that adequate time and resources be allowed for the detailed work to be done right.
If we can't protect credit card numbers today, we urgently need to do things differently, standing as we are on the brink of the Internet of Things.
Yesterday it was reported by The Verge that anonymous hackers have accessed Snapchat's user database and posted 4.6 million user names and phone numbers. In an apparent effort to soften the blow, two digits of the phone numbers were redacted. So we might assume this is a "white hat" exercise, designed to shame Snapchat into improving their security. Indeed, a few days ago Snapchat themselves said they had been warned of vulnerabilities in their APIs that would allow a mass upload of user records.
The response of many has been, well, so what? Some people have casually likened Snapchat's list to a public White Pages; others have played it down as "just email addresses".
Let's look more closely. The leaked list was not in fact public names and phone numbers; it was user names and phone numbers. User names might often be email addresses but these are typically aliases; people frequently choose email addresses that reveal little or nothing of their real world identity. We should assume there is intent in an obscure email address for the individual to remain secret.
Identity theft has become a highly organised criminal enterprise. Crime gangs patiently acquire multiple data sets over many months, sometimes years, gradually piecing together detailed personal profiles. It's been shown time and time again by privacy researchers (perhaps most notably Latanya Sweeney) that re-identification is enabled by linking diverse data sets. And for this purpose, email addresses and phone numbers are superbly valuable indices for correlating an individual's various records. Your email address is common across most of your social media registrations. And your phone number allows your real name and street address to be looked up from reverse White Pages. So the Snapchat breach could be used to join aliases or email addresses to real names and addresses via the phone numbers. For a social engineering attack on a call centre -- or even to open a new bank account -- an identity thief can go an awful long way with real name, street address, email address and phone number.
I was asked in an interview to compare the theft of stolen phone numbers with social security numbers. I surprised the interviewer when I said phone numbers are probably even more valuable to the highly organised ID thief, for they can be used to index names in public directories, and to link different data sets, in ways that SSNs (or credit card numbers for that matter) cannot.
So let us start to treat all personal inormation -- especially when aggregated in bulk -- more seriously! And let's be more cautious in the way we categorise personal or Personally Identifiable Information (PII).
Importantly, most regulatory definitions of PII already embody the proper degree of caution. Look carefully at the US government definition of Personally Identifiable Information:
- information that can be used to distinguish or trace an individual’s identity, either alone or when combined with other personal or identifying information that is linked or linkable to a specific individual (underline added).
This means that items of data can constitute PII if other data can be combined to identify the person concerned. That is, the fragments are regarded as PII even if it is the whole that does the identifying.
And remember that the middle I in PII stands for Identifiable, and not, as many people presume, Identifying. To meet the definition of PII, data need not uniquely identify a person, it merely needs to be directly or indirectly identifiable with a person. And this is how it should be when we heed the way information technologies enable identification through linkages.
Almost anywhere else in the world, data stores like Snapchat's would automatically fall under data protection and information privacy laws; regulators would take a close look at whether the company had complied with the OECD Privacy Principles, and whether Snapchat's security measures were fit for purpose given the PII concerned. But in the USA, companies and commentators alike still have trouble working out how serious these breaches are. Each new breach is treated in an ad hoc manner, often with people finessing the difference between credit card numbers -- as in the recent Target breach -- and "mere" email addresses like those in the Snapchat and Epsilon episodes.
Surely the time has come to simply give proper regulatory protection to all PII.