Lockstep

Mobile: +61 (0) 414 488 851
Email: swilson@lockstep.com.au

Calling for a uniform approach to card fraud, offline and on

This blog is an edited extract from an article of the same name, first published in the Journal of Internet Banking and Commerce, December 2012, vol. 17, no.3.

Abstract

The credit card payments system is a paragon of standardisation. No other industry has such a strong history of driving and adopting uniform technologies, infrastructure and business processes. No matter where you keep a bank account, you can use a globally branded credit card to go shopping in almost every corner of the world. Seamless convenience is underpinned by the universal Four Party settlement model, and a long-standing card standard that works the same with ATMs and merchant terminals everywhere.

So with this determination to facilitate trustworthy and supremely convenient spending everywhere, it’s astonishing that the industry is still yet to standardise Internet payments. Most of the world has settled on the EMV standard for in-store transactions, but online we use a wide range of confusing and largely ineffective security measures. As a result, Card Not Present (CNP) fraud is growing unchecked. This article argues that all card payments should be properly secured using standardised hardware. In particular, CNP transactions should use the very same EMV chip and cryptography as do card present payments.

Skimming and Carding

With “carding”, criminals replicate stolen customer data on blank cards and use those card copies in regular merchant terminals. “Skimming” is one way of stealing card data, by running a card through a copying device when the customer isn’t looking (but it’s actually more common for card data to be stolen in bulk from compromised merchant and processor databases).

A magnetic stripe card stores the customer’s details as a string of ones and zeroes, and presents them to a POS terminal or ATM in the clear. It’s child’s play for criminals to scan the bits and copy them to a blank card.

The industry responded to skimming and carding with EMV (aka Chip-and-PIN). EMV replaces the magnetic storage with an integrated circuit, but more importantly, it secures the data transmitted from card to terminal. EMV works by first digitally signing those ones and zeros in the chip, and then verifying the signature at the terminal. The signing uses a Private Key unique to the cardholder and held safely inside the chip where it cannot be tampered with by fraudsters. It is not feasible to replicate the digital signature without having access to the inner workings of the chip, and thus EMV cards resist carding.

Online Card Fraud

Conventional Card Not Present (CNP) transactions are vulnerable because, a lot like the old mag stripe cards, they rest on clear text cardholder data. On its own, a merchant server cannot tell the difference between the original card data and a copy, just as a terminal cannot tell an original mag stripe card from a criminal's copy.

So CNP fraud is just online carding.

Despite the simplicity of the root problem, the past decade has seen a bewildering patchwork of flimsy and expensive online payments fixes. Various One Time Passwords have come and gone, from scratchy cards to electronic key fobs. Temporary SMS codes have been popular but were recently declared unsafe by the Communications Alliance in Australia, a policy body representing the major mobile carriers.

“3D Insecure”

Meanwhile, extraordinary resources have been squandered on the novel “3D Secure” scheme (MasterCard “SecureCode” and “Verified by Visa”). 3D Secure take-up is piecemeal; it’s widely derided by merchants and customers alike. It is often blocked by browsers; and it throws up odd looking messages that can appear like a phishing attack or other malfunction. Moreover, it upsets the underlying Four Party settlements architecture, slowing transactions to a crawl and introducing untold legal complexities. Payments regulators too appear to have lost interest in 3D Secure.

So why doesn’t the card payments industry go back to its roots, preserve its global Four Party settlement architecture and standards, and tackle the real issue?

Kill two birds with one chip

We could stop most online fraud by using the same chip technologies we deployed to kill off skimming and carding.

It is technically simple to reproduce the familiar card-present user experience in a standard computer. It would just take the will of the financial services industry to make payments by smartcard standard. Computers with built-in smartcard readers have come and gone; they're commonplace in some Eastern European and Asian markets where smartcards are normal for e-health and online voting.

With dual interface and contactless smartcards, the interface options open right up. The Dell E series Latitudes have contactless card readers as standard (aimed at the US Personal ID Verification PIV market). But most mobile devices now feature NFC or “Near Field Communications”, a special purpose device-to-device networking capability, which until now has mostly been used to emulate a payment card. But NFC tablets and smartphones can switch into reader emulation mode, so as to act as a smartcard terminal. Other researchers have recently demonstrated how to read a smartcard via NFC to authenticate the cardholder to a mobile device.

As an alternative, the SIM or other "Secure Element" of most mobile devices could be used to digitally sign card transactions directly, in place of the card. That’s essentially how NFC payment apps works for Card Present transactions – but nobody has yet made the leap to use smart phone hardware security for Card Not Present.

Using a smart payment card with a computer could and should be as easy as using Paywave or Paypass.

Conclusion: Hardware security

All serious payments systems use hardware security. The classic examples include SIM cards, EMV, the Hardware Security Modules mandated by regulators in all ATMs, and the Secure Elements of NFC devices. With well designed hardware security, we gain a lasting upper hand in the criminal arms race.

The Internet and mobile channels will one day overtake the traditional physical payments medium. Indeed, commentators already like to say that the “digital economy” is simply the economy. Therefore, let us stop struggling with stopgap Internet security measures, and let us stop pretending that PCI-DSS audits will stop organised crime stealing card numbers by the million. Instead, we should kill two birds with one stone, and use chip technology to secure both card present and CNP transactions, to deliver the same high standards of usability and security in all channels.

Posted in Smartcards, Security, Payments, Fraud

Who's listening to Ed Snowden?

In one of the most highly anticipated sessions ever at the annual South-by-Southwest (SXSW) culture festival, NSA whistle blower Ed Snowden appeared via live video link from Russia. He joined two privacy and security champions from the American Civil Liberties Union – Chris Soghoian and Ben Wizner – to canvass the vexed tensions between intelligence and law enforcement, personal freedom, government accountability and digital business models.

These guys traversed difficult ground, with respect and much nuance. They agreed the issues are tough, and that proper solutions are non-obvious and slow-coming. The transcript is available here.

Yet afterwards the headlines and tweet stream were dominated by "Snowden's Tips" for personal online security. It was as if Snowden had been conducting a self-help workshop or a Cryptoparty. He was reported to recommend we encrypt our hard drives, encrypt our communications, and use Tor (the special free-and-open-source encrypted browser). These are mostly fine suggestions but I am perplexed why they should be the main takeaways from a complex discussion. Are people listening to Snowdenis broader and more general policy lessons? I fear not. I believe people still conflate secrecy and privacy. At the macro level, the confusion makes it difficult to debate national security policy properly; at a micro level, even if crypto was practical for typical citizens, it is not a true privacy measure. Citizens need so much more than secrecy technologies, whether it's SSL-always-on at web sites, or do-it-yourself encryption.

Ed Snowden is a remarkably measured and thoughtful commentator on national security. Despite being hounded around the word, he is not given to sound bites. His principal concerns appear to be around public accountability, oversight and transparency. He speaks of the strengths and weaknesses of the governance systems already in place; he urges Congress to hold security agency heads to account.

When drawn on questions of technology, he doesn't dispense casual advice; instead he calls for multifaceted responses to our security dilemmas: more cryptological research, better random number generators, better testing, more robust cryptographic building blocks and more careful product design. Deep, complicated engineering stuff.

So how did the media, both mainstream and online alike, distill Snowden's sweeping analysis of politics, policy and engineering into three sterile and quasi-survivalist snippets?

Partly it's due to the good old sensationalism of all modern news media: everyone likes a David-and-Goliath angle where individuals face off against pitiless governments. And there's also the ruthless compression: newspapers cater for an audience with school-age reading levels and attention spans, and Twitter clips our contributions to 140 characters.

But there is also a deeper over-simplification of privacy going on which inhibits our progress.

Too often, people confuse privacy for secrecy. Privacy gets framed as a need to hide from prying eyes, and from that starting position, many advocates descend into a combative, everyone-for-themselves mindset.

However privacy has very little to do with secrecy. We shouldn't have to go underground to enjoy that fundamental human right to be let alone. The social reality is that most of us wish to lead rich and quite public lives. We actually want others to know us – to know what we do, what we like, and what we think – but all within limits. Digital privacy (or more clinically, data protection) is not about hiding; rather it is a state where those who know us are restrained in what they do with the knowledge they have about us.

Privacy is the protection you need when your affairs are not confidential!

So encryption is a sterile and very limited privacy measure. As the SXSW panellists agreed, today's encryption tools really are the preserve of deep technical specialists. Ben Wizner quipped that if the question is how can average users protect themselves online, and the answer is Tor, then "we have failed".

And the problems with cryptography are not just usability and customer experience. A fundamental challenge with the best encryption is that everyone needs to be running the tools. You cannot send out encrypted email unilaterally – you need to first make sure all your correspondents have installed the right software and they've got trusted copies of your encryption keys, or they won't be able to unscramble your messages.

Chris Soghoian also nailed the business problem that current digital revenue models are largely incompatible with encryption. The wondrous free services we enjoy from the Googles and Facebooks of the world are funded in the main by mining our data streams, figuring out our interests, habits and connections, and monetising that synthesised information. The web is in fact bankrolled by surveillance – by Big Business as opposed to government.

End-to-end encryption prevents data mining and would ruin the business model of the companies we've become attached to. If we were to get serious with encryption, we may have to cough up the true price for our modern digital lifestyles.

The SXSW privacy and security panellists know all this. Snowden in particular spent much of his time carefully reiterating many of the basics of data privacy. For instance he echoed the Collection Limitation Principle when he said of large companies that they "can't collect any data; [they] should only collect data and hold it for as long as necessary for the operation of the business". And the Openness Principle: "data should not be collected without people's knowledge and consent". If I was to summarise Snowden's SXSW presentation, I'd say privacy will only be improved by reforming the practices of both governments and big businesses, and by putting far more care into digital product development. Ed Snowden himself doesn't promote neat little technology tips.

It's still early days for the digital economy. We're experiencing an online re-run of the Wild West, with humble users understandably feeling forced to take measures into their own hands. So many individuals have become hungry for defensive online tools and tips. But privacy is more about politics and regulation than technology. I hope that people listen more closely to Ed Snowden on policy, and that his lasting legacy is more about legal reform and transparency than Do-It-Yourself encryption.

Posted in Security, Privacy, Internet

The strengths and weaknesses of Data Privacy in the Age of Big Data

This is the abstract of a current privacy conference proposal.

Synopsis

Many Big Data and online businesses proceed on a naive assumption that data in the "public domain" is up for grabs; technocrats are often surprised that conventional data protection laws can be interpreted to cover the extraction of PII from raw data. On the other hand, orthodox privacy frameworks don't cater for the way PII can be created in future from raw data collected today. This presentation will bridge the conceptual gap between data analytics and privacy, and offer new dynamic consent models to civilize the trade in PII for goods and services.

Abstract

It’s often said that technology has outpaced privacy law, yet by and large that's just not the case. Technology has certainly outpaced decency, with Big Data and biometrics in particular becoming increasingly invasive. However OECD data privacy principles set out over thirty years ago still serve us well. Outside the US, rights-based privacy law has proven effective against today's technocrats' most worrying business practices, based as they are on taking liberties with any data that comes their way. To borrow from Niels Bohr, technologists who are not surprised by data privacy have probably not understood it.

The cornerstone of data privacy in most places is the Collection Limitation principle, which holds that organizations should not collect Personally Identifiable Information beyond their express needs. It is the conceptual cousin of security's core Need-to-Know Principle, and the best starting point for Privacy-by-Design. The Collection Limitation principle is technology neutral and thus blind to the manner of collection. Whether PII is collected directly by questionnaire or indirectly via biometric facial recognition or data mining, data privacy laws apply.

It’s not for nothing we refer to "data mining". But few of those unlicensed data gold diggers seem to understand that the synthesis of fresh PII from raw data (including the identification of anonymous records like photos) is merely another form of collection. The real challenge in Big Data is that we don’t know what we’ll find as we refine the techniques. With the best will in the world, it is hard to disclose in a conventional Privacy Policy what PII might be collected through synthesis down the track. The age of Big Data demands a new privacy compact between organisations and individuals. High minded organizations will promise to keep people abreast of new collections and will offer ways to opt in, and out and in again, as the PII-for-service bargain continues to evolve.

Posted in Social Networking, Privacy, Biometrics, Big Data

gotofail and a defence of purists

'The widely publicised and very serious "gotofail" bug in iOS7 took me back ...

Early in my career I spent seven years in a very special software development environment. I didn't know it at the time, but this experience set the scene for much of my understanding of information security two decades later. I was in a team with a rigorous software development lifecycle; we attained ISO 9001 certification way back in 1998. My company deployed 30 software engineers in product development, 10 of whom were dedicated to testing. Other programmers elsewhere independently wrote manufacture test systems. We spent a lot of time researching leading edge development methodologies, such as Cleanroom, and formal specification languages like Z.

We wrote our own real time multi-tasking operating system; we even wrote our own C compiler and device drivers! Literally every single bit of the executable code was under our control. "Anal" doesn't even begin to describe our corporate culture.

Why all the fuss? Because at Telectronics Pacing Systems, over 1986-1990, we wrote the code for the world's first software controlled implantable defibrillator, the Guardian 4210.

The team spent relatively little time actually coding; we were mostly occupied writing and reviewing documents. And then there were the code inspections. We walked through pseudo-code during spec reviews, and source code during unit validation. And before finally shipping the product, we inspected the entire 40,000 lines of source code. That exercise took five people two months.

For critical modules, like the kernel and error correction routines, we walked through the compiled assembly code. We took the time to simulate the step-by-step operation of the machine code using pen and paper, each team member role-playing parts of the microprocessor (Phil would pretend to be the accumulator, Lou the program counter, me the index register). By the end of it all, we had several people who knew the defib's software like the back of their hand.

And we had demonstrably the most reliable real time software ever written. After amassing several thousand implant-years, we measured a bug rate of less than one in 10,000 lines.

The implant software team had a deserved reputation as pedants. Over 25 person years, the average rate of production was one line of debugged C per team member per day. We were painstaking, perfectionist, purist. And argumentative! Some of our debates were excruciating to behold. We fought over definitions of “verification” and “validation”; we disputed algorithms and test tools, languages and coding standards. We were even precious about code layout, which seemed to some pretty silly at the time.

Yet 20 years later, purists are looking good.

Last week saw widespread attention to a bug in Apple's iOS operating system which rendered website security impotent. The problem arose from a single superfluous line of code – an extra goto statement – that nullified checking of SSL connections, leaving users totally vulnerable to fake websites. The Twitterverse nicknamed the flaw #gotofail.

There are all sorts of interesting quality control questions in the #gotofail experience.


  • Was the code inspected? Do companies even do code inspections these days?
  • The extra goto was said to be a recent change to the source; if that's the case, what regression testing was performed on the change?
  • How are test cases selected?
  • For something as important as SSL, are there not test rigs with simulated rogue websites to stress test security systems before release?

There seems to have been egregious shortcomings at every level : code design, code inspection, and testing.

A lot of attention is being given to the code layout. The spurious goto is indented in such a way that it appears to be part of a branch, but it is not. If curly braces were used religiously, or if an automatic indenting tool was applied, then the bug would have been more obvious (assuming that the code gets inspected). I agree of course that layout and coding standards are important, but there is a much more robust way to make source code clearer.

Beyond the lax testing and quality control, there is also a software-theoretic question in all this that is getting hardly any attention: Why are programmers using ANY goto statements at all?

I was taught at college and later at Telectronics to avoid goto statements at all cost. Yes, on rare occasions a goto statement makes the code more compact, but with care, a program can almost always be structured to be compact in other ways. Don't programmers care anymore about elegance in logic design? Don't they make efforts to set out their code in a rigorous structured manner?

The conventional wisdom is that goto statements make source code harder to understand, harder to test and harder to maintain. Kernighan and Ritchie - UNIX pioneers and authors of the classic C programming textbook - said the goto statement is "infinitely abusable" and it "be used sparingly if at all." Before them, one of programming's giants, Edsger Dijkstra, wrote in 1968 that "The go to statement ... is too much an invitation to make a mess of one's program"; see Go To Statement Considered Harmful. The goto creates spaghetti code. The landmark structured programming language PASCAL doesn't even have a goto statement! At Telectronics our coding standard prohibited without exception gotos in all implantable software.

Hard to understand, hard to test and hard to maintain is exactly what we see in the flawed iOS7 code. The critical bug never would have happened if Apple too banned the goto.

Now, I am hardly going to suggest that fanatical coding standards and intellectual rigor are sufficient to make software secure (see also "Security Isn’t Secure). It's unlikely that many commercial developers will be able to cost-justify exhaustive code walkthroughs when millions of lines are involved even in the humble mobile phone. It’s not as if lives depend on commercial software.

Or do they?!

Let’s leave aside that vexed question for now and return to fundamentals.

The #gotofail episode will become a text book example of not just poor attention to detail, but moreover, the importance of disciplined logic, rigor, elegance, and fundamental coding theory.

A still deeper lesson in all this is the fragility of software. Prof Arie van Deursen nicely describes the iOS7 routine as "brittle". I want to suggest that all software is tragically fragile. It takes just one line of silly code to bring security to its knees. The sheer non-linearity of software – the ability for one line of software anywhere in a hundred million lines to have unbounded impact on the rest of the system – is what separates development from conventional engineering practice. Software doesn’t obey the laws of physics. No non-trivial software can ever fully tested, and we have gone too far for the software we live with to be comprehensively proof read. We have yet to build the sorts of software tools and best practice and habits that would merit the title "engineering".

I’d like to close with a philosophical musing that might have appealed to my old mentors at Telectronics. We have reached a sort of pinnacle in post-modernism where the real world has come to pivot precariously on pure text. It is weird and wonderful that engineers are arguing about the layout of source code – as if they are poetry critics.

We have come to depend daily on great obscure texts, drafted not by people we can truthfully call "engineers" but by a largely anarchic community we would be better of calling playwrights.

Posted in Software engineering, Security

FIDO Alliance goes from strength to strength

With a bunch of exciting new members joining up on the eve of the RSA Conference, the FIDO Alliance is going from strength to strength. And they've just published the first public review drafts of their core "universal authentication" protocols.

An update to my Constellation Research report on FIDO is now available. Here's a preview.

The Go-To standards alliance in protocols for modern identity management

The FIDO Alliance – for Fast IDentity Online – is a fresh, fast growing consortium of security vendors and end users working out a new suite of protocols and standards to connect authentication endpoints to services. With an unusual degree of clarity in this field, FIDO envisages simply "doing for authentication what Ethernet did for networking".

Launched in early 2013, the FIDO Alliance has already grown to nearly 100 members, amongst which are heavyweights like Google, Lenovo, MasterCard, Microsoft and PayPal as well as a couple of dozen biometrics vendors, many of the leading Identity and Access Management solutions and service providers and several global players in the smartcard supply chain.

FIDO is different. The typical hackneyed elevator pitch in Identity and Access Management promises to "fix the password crisis" – usually by changing the way business is done. Most IDAM initiatives unwittingly convert clear-cut technology problems into open-ended business transformation problems. In contrast, FIDO's mission is refreshingly clear cut: it seeks to make strong authentication interoperable between devices and servers. When users have activated FIDO-compliant endpoints, reliable fine-grained information about their client environment becomes readily discoverable by any servers, which can then make access control decisions, each according to its own security policy.

With its focus, pragmatism and critical mass, FIDO is justifiably today's go-to authentication standards effort.

In February 2014, the FIDO Alliance announced the release of its first two protocol drafts, and a clutch of new members including powerful players in financial services, the cloud and e-commerce. Constellation notes in particular the addition to the board of security leader RSA and another major payments card, Discover. And FIDO continues to strengthen its vital “Relying Party” (service provider) representation with the appearance of Aetna, Goldman Sachs, Netflix and Salesforce.com.

It's time we fixed the Authentication plumbing

In my view, the best thing about FIDO is that it is not about federated identity but instead it operates one layer down in what we call the digital identity stack. This might seem to run against the IDAM tide, but it's refreshing, and it may help the FIDO Alliance sidestep the quagmire of identity policy mapping and legal complexities. FIDO is not really about the vexed general issue of "identity" at all! Instead, it's about low level authentication protocols; that is, the plumbing.

The FIDO Alliance sets out its mission as follows:

  • Change the nature of online authentication by:
    • Developing technical specifications that define an open, scalable, interoperable set of mechanisms that reduce the reliance on passwords to authenticate users.
    • Operating industry programs to help ensure successful worldwide adoption of the Specifications.
    • Submitting mature technical Specification(s) to recognized standards development organization(s) for formal standardization.

The engineering problem underlying Federated Identity is actually pretty simple: if we want to have a choice of high-grade physical, multi-factor "keys" used to access remote services, how do we convey reliable cues to those services about the type of key being used and the individual who's said to be using it? If we can solve that problem, then service providers and Relying Parties can sort out for themselves precisely what they need to know about the users, sufficient to identify and authenticate them.

All of these leaves the 'I' in the acronym "FIDO" a little contradictory. It's such a cute name (alluding of course to the Internet dog) that it's unlikely to change. Instead, I overheard that the acronym might go the way of "KFC" where eventually it is no longer spelled out and just becomes a word in and of itself.

FIDO Alliance Board Members

  • Blackberry
  • CrucialTec (manufactures innovative user input devices for mobiles)
  • Discover Card
  • Google
  • Lenovo
  • MasterCard
  • Microsoft
  • Nok Nok Labs (a specialist authentication server software company)
  • NXP Semiconductors (a global supplier of card chips, SIMs and Secure Elements)
  • Oberthur Technologies (a multinational smartcard and mobility solutions provider)
  • PayPal
  • RSA
  • Synaptics (fingerprint biometrics)
  • Yubico (the developer of the YubiKey PKI enabled 2FA token).

FIDO Alliance Board Sponsor Level Members

  • Aetna
  • ARM
  • AGNITiO
  • Dell
  • Discretix
  • Entersekt
  • EyeLock Inc.
  • Fingerprint Cards AB
  • FingerQ
  • Goldman Sachs
  • IdentityX
  • IDEX ASA
  • Infineon
  • Kili
  • Netflix
  • Next Biometrics Group
  • Oesterreichische Staatsdruckerei GmbH
  • Ping Identity
  • SafeNet
  • Salesforce
  • SecureKey
  • Sonavation
  • STMicroelectronics
  • Wave Systems

Stay tuned for the updated Constellation Research report.

Posted in Smartcards, Security, Identity, Federated Identity, Constellation Research, Biometrics

How are we to defend anonymity?

If anonymity is important, what is the legal basis for defending it?
I find that conventional data privacy law in most places around the world already protects anonymity, insofar as the act of de-anonymization represents an act of PII Collection - the creation of a named record. As such, de-anonymization cannot be lawfully performed without an express need to to do, or consent.

Foreword

Cynics have been asking the same rhetorical question "is privacy dead?" for at least 40 years. Certainly information technology and ubiquitous connectivity have made it nearly impossible to hide, and so anonymity is critically ill. But privacy is not the same thing as secrecy; privacy is a state where those who know us, respect the knowledge they have about us. Privacy generally doesn't require us hiding from anyone; it requires restraint on the part of those who hold Personal Information about us.

The typical public response to data breaches, government surveillance and invasions like social media facial recognition is vociferous. People in general energetically assert their rights to not be tracked online, or to have their personal information exploited behind their backs. These reactions show that the idea of privacy alive and well.

The end of anonymity perhaps

Against a backdrop of spying revelations and excesses by social media companies especially in regards to facial recognition, there have been recent calls for a "new jurisprudence of anonymity"; see Yale law professor Jed Rubenfeld writing in the Washington Post of 13 Jan 2014. I wonder if there is another way to crack the nut? Because any new jurisprudence is going to take a very long time.

Instead, I suggest we leverage the way most international privacy law and privacy experience -- going back decades -- is technology neutral with regards to the method of collection. In some jurisdictions like Australia, the term "collection" is not even defined in privacy law. Instead, the law just uses the normal plain English sense of the word, when it frames principles like Collection Limitation: basically, you are not allowed to collect (by any means) Personally Identifiable Information without a good reasonable express reason. It means that if PII gets into a data system, the system is accountable under privacy law for that PII, no matter how it got there.

This technology neutral view of PII collection has satisfying ramifications for all the people who intuit that Big Data has got too "creepy". We can argue that if a named record is produced afresh by a Big Data process (especially if that record is produced without the named person being aware of it, and from raw data that was originally collected for some other purpose) then that record has logically been collected. Whether PII is collected directly, or collected indirectly, or is in fact created by an obscure process, privacy law is largely agnostic.

Prof Rubenfeld wrote:

  • "The NSA program isn’t really about gathering data. It's about mining data. All the data are there already, digitally stored and collected by telecom giants, just waiting." [italics in original]

I suggest that the output of the data mining, if it is personally identifiable and especially if it has been rendered identifiable by processing previously anonymous raw data, has is a fresh collection by the mining operation. As such, the miners should be accountable for their newly minted PII, just as though they had collected gathered it directly from the persons concerned.

For now, I don't want to go further and argue the rights and wrongs of surveillance. I just want to show a new way to frame the privacy questions in surveillance and big data, making use of existing jurisprudence. If I am right and the NSA is in effect collecting PII as it goes about its data mining, then that provides a possibly fresh understanding of what's going on, within which we can objectively analyse the rights and wrongs.

I am actually the first to admit that within this frame, the NSA might still be justified in mining data, and there might be no actual technical breach of information privacy law, if for instance the NSA enjoys a law enforcement exemption. These are important questions that need to be debated, but elsewhere (see my recent blog on our preparedness to actually have such a debate). My purpose right now is to frame a way to defend anonymity using as much existing legal infrastructure as possible.

But Collection is not limited everywhere

There is an important legal-technical question in all this: Is the collection of PII actually regulated? In Europe, Australia, New Zealand and in dozens of countries, collection is limited, but in the USA, there is no general restriction against collecting PII. America has no broad data protection law, and in any case, the Fair Information Practice Principles (FIPPs) don't include a Collection Limitation principle.

So there may be few regulations in the USA that would carry my argument there! Nevertheless, surely we can use international jurisprudence in Collection Limitation instead of creating new American jurisprudence around anonymity?

So I'd like to put the following questions Jed Rubenfeld:


  • Do technology neutral Collection Limitation Principles in theory provide a way to bring de-anonymised data into scope for data privacy laws? Is this a way to address peoples' concerns with Big Data?
  • How does international jurisprudence around Collection Limitation translate to American schools of legal thought?
  • Does this way of looking at the problem create new impetus for Collection Limitation to be introduced into American privacy principles, especially the FIPPs?

Appendix: "Applying Information Privacy Norms to Re-Identification"

In 2013 I presented some of these ideas to an online symposium at the Harvard Law School Petrie-Flom Center, on the Law, Ethics & Science of Re-identification Demonstrations. What follows is an extract from that presentation, in which I spell out carefully the argument -- which was not obvious to some at the time -- that when genetics researchers combine different data sets to demonstrate re-identification of donated genomic material, they are in effect collecting patient PII. I argue that this type of collection should be subject to ethics committee approval just as if the researchers were collecting the identities from the patients directly.

... I am aware of two distinct re-identification demonstrations that have raised awareness of the issues recently. In the first, Yaniv Erlich [at MIT's Whitehead Institute] used what I understand are new statistical techniques to re-identify a number of subjects that had donated genetic material anonymously to the 1000 Genomes project. He did this by correlating genes in the published anonymous samples with genes in named samples available from genealogical databases. The 1000 Genomes consent form reassured participants that re-identification would be "very hard". In the second notable demo, Latanya Sweeney re-identified volunteers in the Personal Genome Project using her previously published method of using a few demographic values (such as date or birth, sex and postal code) extracted from the otherwise anonymous records.

A great deal of the debate around these cases has focused on the consent forms and the research subjects’ expectations of anonymity. These are important matters for sure, yet for me the ethical issue in de-anonymisation demonstrations is more about the obligations of third parties doing the identification who had nothing to do with the original informed consent arrangements. The act of recording a person’s name against erstwhile anonymous data represents a collection of personal information. The implications for genomic data re-identification are clear.

Let’s consider Subject S who donates her DNA, ostensibly anonymously, to a Researcher R1, under some consent arrangement which concedes there is a possibility that S will be re-identified. And indeed, some time later, an independent researcher R2 does identify S and links her to the DNA sample. The fact is that R2 has collected personal information about S. If R2 has no relationship with S, then S has not consented to this new collection of her personal information.

Even if the consent form signed at the time of the original collection includes a disclaimer that absolute anonymity cannot be guaranteed, re-identifying the DNA sample later represents a new collection, one that has been undertaken without any consent. Given that S has no knowledge of R2, there can be no implied consent in her original understanding with R1, even if absolute anonymity was disclaimed.

Naturally the re-identification demonstrations have served a purpose. It is undoubtedly important that the limits of anonymity be properly understood, and the work of Yaniv and Latanya contribute to that. Nevertheless, these demonstrations were undertaken without the knowledge much less the consent of the individuals concerned. I contend that bioinformaticians using clever techniques to attach names to anonymous samples need ethics approval, just as they would if they were taking fresh samples from the people concerned.

See also my letter to the editor of Science magazine.

Posted in Privacy, Big Data

Security Isn't Secure

That is, information security is not intellectually secure. Almost every precept of orthodox information security is ready for a shake-up. Infosec practices are built on crumbling foundations.

UPDATE: I've been selected to speak on this topic at the 2014 AusCERT Conference - the biggest information security event in Australasia.

The recent tragic experience of data breaches -- at Target, Snapchat, Adobe Systems and RSA to name a very few -- shows that orthodox information security is simply not up to the task of securing serious digital assets. We have to face facts: no amount of today's conventional security is ever going to protect assets worth billions of dollars.

Our approach to infosec is based on old management process standards (which can be traced back to ISO 9000) and a ponderous technology neutrality that overly emphasises people and processes. The things we call "Information Security Management Systems" are actually not systems that any engineer would recognise but instead are flabby sets of documents and audit procedures.

"Continuous security improvement" in reality is continuous document engorgement.

Most ISMSs sit passively on shelves and share drives doing nothing for 12 months, until the next audit, when the papers become the centre of attention (not the actual security). Audit has become a sick joke. ISO 27000 and PCI assessors have the nerve to tell us their work only provides a snapshot, and if a breach occurs between visits, it's not their fault. In their words they admit therefore that audits do not predict performance between audits. While nobody is looking, our credit card numbers are about as secure as Schrodinger's Cat!

The deep problem is that computer systems have become so very complex and so very fragile that they are not manageable by traditional means. Our standard security tools, including Threat & Risk Assessment and hierarchical layered network design, are rooted in conventional engineering. Failure Modes & Criticality Analysis works well in linear systems, where small perturbations have small effects, but IT is utterly unlike this. The smallest most trivial omission in software or in a server configuration can have dire and unlimited consequences. It's like we're playing Jenga.

Update: Barely a month after I wrote this blog, we heard about the "goto fail" bug in the Apple iOS SSL routines, which resulted from one spurious line of code. It might have been more obvious to the programmer and/or any code reviewer had the code been indented differently or if curly braces were used rigorously.

Security needs to be re-thought from the ground up. We need some bigger ideas.

We need less rigid, less formulaic security management structures, to encourage people at the coal face to exercise their judgement and skill. We need straight talking CISOs with deep technical experience in how computers really work, and not 'suits' more focused on the C-suite than the dev teams. We have to stop writing impenetrable hierarchical security policies and SOPs (in the "waterfall" manner we recognised decades ago fails to do much good in software development). And we need to equate security with software quality and reliability, and demand that adequate time and resources be allowed for the detailed work to be done right.

If we can't protect credit card numbers today, we urgently need to do things differently, standing as we are on the brink of the Internet of Things.

Posted in Software engineering, Security

The Snapchat data breach

Yesterday it was reported by The Verge that anonymous hackers have accessed Snapchat's user database and posted 4.6 million user names and phone numbers. In an apparent effort to soften the blow, two digits of the phone numbers were redacted. So we might assume this is a "white hat" exercise, designed to shame Snapchat into improving their security. Indeed, a few days ago Snapchat themselves said they had been warned of vulnerabilities in their APIs that would allow a mass upload of user records.

The response of many has been, well, so what? Some people have casually likened Snapchat's list to a public White Pages; others have played it down as "just email addresses".

Let's look more closely. The leaked list was not in fact public names and phone numbers; it was user names and phone numbers. User names might often be email addresses but these are typically aliases; people frequently choose email addresses that reveal little or nothing of their real world identity. We should assume there is intent in an obscure email address for the individual to remain secret.

Identity theft has become a highly organised criminal enterprise. Crime gangs patiently acquire multiple data sets over many months, sometimes years, gradually piecing together detailed personal profiles. It's been shown time and time again by privacy researchers (perhaps most notably Latanya Sweeney) that re-identification is enabled by linking diverse data sets. And for this purpose, email addresses and phone numbers are superbly valuable indices for correlating an individual's various records. Your email address is common across most of your social media registrations. And your phone number allows your real name and street address to be looked up from reverse White Pages. So the Snapchat breach could be used to join aliases or email addresses to real names and addresses via the phone numbers. For a social engineering attack on a call centre -- or even to open a new bank account -- an identity thief can go an awful long way with real name, street address, email address and phone number.

I was asked in an interview to compare the theft of stolen phone numbers with social security numbers. I surprised the interviewer when I said phone numbers are probably even more valuable to the highly organised ID thief, for they can be used to index names in public directories, and to link different data sets, in ways that SSNs (or credit card numbers for that matter) cannot.

So let us start to treat all personal inormation -- especially when aggregated in bulk -- more seriously! And let's be more cautious in the way we categorise personal or Personally Identifiable Information (PII).

Importantly, most regulatory definitions of PII already embody the proper degree of caution. Look carefully at the US government definition of Personally Identifiable Information:

  • information that can be used to distinguish or trace an individual’s identity, either alone or when combined with other personal or identifying information that is linked or linkable to a specific individual (underline added).

This means that items of data can constitute PII if other data can be combined to identify the person concerned. That is, the fragments are regarded as PII even if it is the whole that does the identifying.

And remember that the middle I in PII stands for Identifiable, and not, as many people presume, Identifying. To meet the definition of PII, data need not uniquely identify a person, it merely needs to be directly or indirectly identifiable with a person. And this is how it should be when we heed the way information technologies enable identification through linkages.

Almost anywhere else in the world, data stores like Snapchat's would automatically fall under data protection and information privacy laws; regulators would take a close look at whether the company had complied with the OECD Privacy Principles, and whether Snapchat's security measures were fit for purpose given the PII concerned. But in the USA, companies and commentators alike still have trouble working out how serious these breaches are. Each new breach is treated in an ad hoc manner, often with people finessing the difference between credit card numbers -- as in the recent Target breach -- and "mere" email addresses like those in the Snapchat and Epsilon episodes.

Surely the time has come to simply give proper regulatory protection to all PII.

Posted in Social Networking, Social Media, Security, Privacy, Identity, Fraud, Big Data

The ROI for breaching Target

An unhappy holiday for Target customers

A week before Christmas, Target in the US revealed it had suffered a massive payment card data breach, with some 40 million customers affected. Details of the breach are still emerging. No well-informed criticism has yet to emerge of Target's security; instead most observers say that Target has very serious security, and therefore this latest attack must have been very sophisticated, or else an inside job. It appears Target was deemed PCI-DSS compliant -- which only goes to prove yet again the futility of the PCI audit regime for deterring organized criminals.

Security analyst Brian Krebs has already seen evidence of a "fire sale" on carding sites. Cardholder records are worth several dollars each, up to $44 according to Krebs for "fresh" accounts. So the Return on Investment for really big attacks like this one on Target (and before that, on Adobe, Heartland Payments Systems, TJMaxx and Sony) can approach one billion dollars.

We have to face the fact that no amount of conventional IT security can protect a digital asset worth a billion dollars. Conventional security can repel amateur attacks and prevent accidental losses, but security policies, audits and firewalls are not up to the job when a determined thief knows what they're looking for.

It's high time that we rendered payment card data immune to criminal reuse. This is not a difficult technological problem; it's been solved before in Card Present transactions around the world, and with a little will power, the payments industry could do it again for Internet payments, nullifying the black market in stolen card data.

A history of strong standardisation

The credit card payments system is a paragon of standardisation. No other industry has such a strong history of driving and adopting uniform technologies, infrastructure and business processes. No matter where you keep a bank account, you can use a globally branded credit card to go shopping in almost every corner of the planet. This seamless interoperability is created by the universal Four Party settlement model, and a long-standing plastic card standard that works the same with ATMs and merchant terminals absolutely everywhere.

So with this determination to facilitate trustworthy and supremely convenient spending in worldwide, it's astonishing that the industry is still yet to standardise Internet payments! We have for the most part settled on the EMV chip card standard for in-store transactions, but online we use a wide range of confusing, piecemeal and largely ineffective security measures. As a result, Card Not Present (CNP) fraud has boomed. I argue that all card payments -- offline and online -- should be properly secured using standardised hardware. In particular, CNP transactions should either use the very same EMV chip and cryptography as do Card Present payments, or it should exploit the capability of mobile handsets and especially Secure Elements.

CNP Fraud trends

The Australian Payments Clearing Association (APCA) releases twice-yearly card fraud statistics, broken down by fraud type: skimming & carding, Card Not Present, stolen cards and so on. Lockstep Consulting monitors the APCA releases and compiles a longitudinal series. The latest Australian card fraud figures are shown below.

CNP trends pic to FY 2013


APCA like other regulators tend to varnish the rise in CNP fraud, saying it's smaller than the overall rise in e-commerce. There are several ways to interpret this contextualization. The population-wide systemic advantages of e-commerce can indeed be said to outweigh the fraud costs, yet this leaves the underlying vulnerability to payments fraud unaddressed, and ignores the qualitative problems suffered by the individual victims of fraud (as they say, history is written by the winners). It's pretty complacent to play down fraud as being small compared with the systemic benefit of shopping online; it would be like meekly attributing a high road toll to the popularity of motor cars. At some point, we have to do something about safety!

[And note very carefully that online fraud and online shopping are not in fact two sides of the same coin. Criminals obtain most of their stolen card data from offline retail and processing environments. It's a bit rude to argue CNP fraud is small as a proportion of e-commerce when some people who suffer from stolen card data might have never shopped online in their lives!]

Frankly it's a mystery why the payments industry seems so bamboozled by CNP fraud, because technically it's a very simple problem. And it's one we've already solved elsewhere. For Card Not Present fraud is simply online carding.

Skimming and Carding

In carding, criminals replicate stolen customer data on blank cards; with CNP fraud they replay stolen data on merchant servers.

A magstripe card stores the customer's details as a string of ones and zeroes, and presents them to a POS terminal or ATM in the clear. It's child's play for criminals to scan the bits and copy them to a blank card.

The payments industry responded to skimming and carding with EMV (aka Chip-and-PIN). EMV replaces the magnetic storage with an integrated circuit, but more importantly, it secures the data transmitted from card to terminal. EMV works by first digitally signing those ones and zeros in the chip, and then verifying the signature at the terminal. The signing uses a Private Key unique to the cardholder and held safely inside the chip where it cannot be tampered with by fraudsters. It is not feasible to replicate the digital signature without having access to the inner workings of the chip, and thus EMV cards resist carding.

Online card fraud

Conventional Card Not Present (CNP) transactions are vulnerable because, like the old magstripe cards themselves, they rest on cleartext cardholder data. On its own, a merchant server cannot tell the difference between the original card data and a copy, just as a terminal cannot tell an original magstripe card from a criminal's copy.

Despite the simplicity of the root problem, the past decade has seen a bewildering patchwork of flimsy and expensive online payments fixes. Various One Time Passwords have come and gone, from scratchy cards to electronic key fobs. Temporary SMS codes have been popular for two-step verification of transactions but were recently declared unfit for purpose by the Communications Alliance in Australia, a policy body representing the major mobile carriers.

Meanwhile, extraordinary resources have been squandered on the novel "3D Secure" scheme (MasterCard SecureCode and Verified by Visa). 3D Secure take-up is piecemeal; it's widely derided by merchants and customers alike. It upsets the underlying Four Party settlements architecture, slowing transactions to a crawl and introducing untold legal complexities.

A solution is at hand -- we've done it before

Why doesn't the card payments industry go back to its roots, preserve its global architecture and standards, and tackle the real issue? We could stop most online fraud by using the same chip technologies we deployed to kill off skimming.

It is technically simple to reproduce the familiar card-present user experience in a standard computer or in digital form on a smart phone. It would just take the will of the financial services industry to standardise digital signatures on payment messages sent from a card holder's device or browser to a merchant server.

And there is ample room for innovative payments modalities in online and mobile commerce settings:

  • A smart phone can hold a digital wallet of keys corresponding to the owner's cards; the keys can be invoked by a payments app, ideally inside a Secure Element in the handset, to digitally sign each payment, preventing tampering, theft and replay.

  • A tablet computer or smart phone can interface a conventional contactless payment card over the NFC (Near Field Communications) channel and use that card to sign transactions (see also the NFC interface demo by IBM Research).

  • Many laptop computers feature smartcard readers (some like the Dell e-series Latitudes even have contactless readers) which could accept conventional credit or debit cards.

  • Conclusion

    All serious payments systems use hardware security. The classic examples include SIM cards, EMV, the Hardware Security Modules mandated by regulators in all ATMs, and the Secure Elements of NFC mobile devices. With well-designed hardware security, we gain a lasting upper hand in the cybercrime arms race.

    The Internet and mobile channels will one day overtake the traditional physical payments medium. Indeed, commentators already like to say that the "digital economy" is simply the economy. Therefore, let us stop struggling with stopgap Internet security measures, and let us stop pretending that PCI-DSS audits will stop organised crime stealing card numbers by the million. Instead, we should kill two birds with one stone, and use chip technology to secure both Card Present and CNP transactions, to deliver the same high standards of usability and security in all channels.

    Until we render stolen card data useless to criminals, the Return on Investment will remain high for even very sophisticated attacks (or simply bribing insiders), and spectacular data breaches like Target's will continue.

    Posted in Smartcards, Security, Payments, Fraud

    Facebook's challenge to the Collection Limitation Principle

    Facebook's challenge to the Collection Limitation Principle

    An extract from our chapter in the forthcoming Encyclopedia of Social Network Analysis and Mining (to be published by Springer in 2014).

    Stephen Wilson, Lockstep Consulting, Sydney, Australia.
    Anna Johnston, Salinger Privacy, Sydney, Australia.

    Key Points

    • Facebook's business practices pose a risk of non-compliance with the Collection Limitation Principle (OECD Privacy Principle No. 1, and corresponding Australian National Privacy Principles NPP 1.1 through 1.4).
    • Privacy problems will likely remain while Facebook's business model remains unsettled, for the business is largely based on collecting and creating as much Personal Information as it can, for subsequent and as yet unspecified monetization.
    • If an OSN business doesn't know how it is eventually going to make money from Personal Information, then it has a fundamental difficulty with the Collection Limitation principle.

    Introduction

    Facebook is an Internet and societal phenomenon. Launched in 2004, in just a few years it has claimed a significant proportion of the world's population as regular users, becoming by far the most dominant Online Social Network (OSN). With its success has come a good deal of controversy, especially over privacy. Does Facebook herald a true shift in privacy values? Or, despite occasional reckless revelations, are most users no more promiscuous than they were eight years ago? We argue it's too early to draw conclusions about society as a whole from the OSN experience to date. In fact, under laws that currently stand, many OSNs face a number of compliance risks in dozens of jurisdictions.

    Over 80 countries worldwide now have enacted data privacy laws, around half of which are based on privacy principles articulated by the OECD. Amongst these are the Collection Limitation Principle which requires businesses to not gather more Personal Information than they need for the tasks at hand, and the Use Limitation Principle which dictates that Personal Information collected for one purpose not be arbitrarily used for others without consent.
    Overt collection, covert collection (including generation) and "innovative" secondary use of Personal Information are the lifeblood of Facebook. While Facebook's founder would have us believe that social mores have changed, a clash with orthodox data privacy laws creates challenges for the OSN business model in general.

    This article examines a number of areas of privacy compliance risk for Facebook. We focus on how Facebook collects Personal Information indirectly, through the import of members' email address books for "finding friends", and by photo tagging. Taking Australia's National Privacy Principles from the Privacy Act 1988 (Cth) as our guide, we identify a number of potential breaches of privacy law, and issues that may be generalised across all OECD-based privacy environments.

    Terminology

    Australian law tends to use the term "Personal Information" rather than "Personally Identifiable Information" although they are essentially synonymous for our purposes.

    Terms of reference: OECD Privacy Principles and Australian law

    The Organisation for Economic Cooperation and Development has articulated eight privacy principles for helping to protect personal information. The OECD Privacy Principles are as follows:

    • 1. Collection Limitation Principle
    • 2. Data Quality Principle
    • 3. Purpose Specification Principle
    • 4. Use Limitation Principle
    • 5. Security Safeguards Principle
    • 6. Openness Principle
    • 7. Individual Participation Principle
    • 8. Accountability Principle

    Of most interest to us here are principles one and four:

    • Collection Limitation Principle: There should be limits to the collection of personal data and any such data should be obtained by lawful and fair means and, where appropriate, with the knowledge or consent of the data subject.
    • Use Limitation Principle: Personal data should not be disclosed, made available or otherwise used for purposes other than those specified in accordance with [the Purpose Specification] except with the consent of the data subject, or by the authority of law.

    At least 89 counties have some sort of data protection legislation in place [Greenleaf, 2012]. Of these, in excess of 30 jurisdictions have derived their particular privacy regulations from the OECD principles. One example is Australia.

    We will use Australia's National Privacy Principles NPPs in the Privacy Act 1988 as our terms of reference for analysing some of Facebook's systemic privacy issues. In Australia, Personal Information is defined as: information or an opinion (including information or an opinion forming part of a database), whether true or not, and whether recorded in a material form or not, about an individual whose identity is apparent, or can reasonably be ascertained, from the information or opinion.

    Indirect collection of contacts

    One of the most significant collections of Personal Information by Facebook is surely the email address book of those members that elect to have the site help "find friends". This facility provides Facebook with a copy of all contacts from the address book of the member's nominated email account. It's the very first thing that a new user is invited to do when they register. Facebook refer to this as "contact import" in the Data Use Policy (accessed 10 August 2012).

    "Find friends" is curtly described as "Search your email for friends already on Facebook". A link labelled "Learn more" in fine print leads to the following additional explanation:

    • "Facebook won't share the email addresses you import with anyone, but we will store them on your behalf and may use them later to help others search for people or to generate friend suggestions for you and others. Depending on your email provider, addresses from your contacts list and mail folders may be imported. You should only import contacts from accounts you've set up for personal use." [underline added by us].

    Without any further elaboration, new users are invited to enter their email address and password if they have a cloud based email account (such as Hotmail, gmail, Yahoo and the like). These types of services have an API through which any third party application can programmatically access the account, after presenting the user name and password.

    It is entirely possible that casual users will not fully comprehend what is happening when they opt in to have Facebook "find friends". Further, there is no indication that, by default, imported contact details are shared with everyone. The underlined text in the passage quoted above shows Facebook reserves the right to use imported contacts to make direct approaches to people who might not even be members.

    Importing contacts represents an indirect collection by Facebook of Personal Information of others, without their authorisation or even knowledge. The short explanatory information quoted above is not provided to the individuals whose details are imported and therefore does not constitute a Collection Notice. Furthermore, it leaves the door open for Facebook to use imported contacts for other, unspecified purposes. The Data Use Policy imposes no limitations as to how Facebook may make use of imported contacts.

    Privacy harms are possible in social networking if members blur the distinction between work and private lives. Recent research has pointed to the risky use of Facebook by young doctors, involving inappropriate discussion of patients [Moubarak et al, 2010]. Even if doctors are discreet in their online chat, we are concerned that they may run foul of the Find Friends feature exposing their connections to named patients. Doctors on Facebook who happen to have patients in their web mail address books can have associations between individuals and their doctors become public. In mental health, sexual health, family planning, substance abuse and similar sensitive fields, naming patients could be catastrophic for them.

    While most healthcare professionals may use a specific workplace email account which would not be amenable to contacts import, many allied health professionals, counselors, specialists and the like run their sole practices as small businesses, and naturally some will use low cost or free cloud-based email services. Note that the substance of a doctor's communications with their patients over web mail is not at issue here. The problem of exposing associations between patients and doctors arises simply from the presence of a name in an address book, even if the email was only ever used for non-clinical purposes such as appointments or marketing.

    Photo tagging and biometric facial recognition

    One of Facebook's most "innovative" forms of Personal Information Collection would have to be photo tagging and the creation of biometric facial recognition templates.

    Photo tagging and "face matching" has been available in social media for some years now. On photo sharing sites such as Picasa, this technology "lets you organize your photos according to the people in them" in the words of the Picasa help pages. But in more complicated OSN settings, biometrics has enormous potential to both enhance the services on offer and to breach privacy.

    In thinking about facial recognition, we start once more with the Collection Principle. Importantly, nothing in the Australian Privacy Act circumscribes the manner of collection; no matter how a data custodian comes to be in possession of Personal Information (being essentially any data about a person whose identity is apparent) they may be deemed to have collected it. When one Facebook member tags another in a photo on the site, then the result is that Facebook has overtly but indirectly collected PI about the tagged person.

    Facial recognition technologies are deployed within Facebook to allow its servers to automatically make tag suggestions; in our view this process constitutes a new type of Personal Information Collection, on a potentially vast scale.

    Biometric facial recognition works by processing image data to extract certain distinguishing features (like the separation of the eyes, nose, ears and so on) and computing a numerical data set known as a template that is highly specific to the face, though not necessarily unique. Facebook's online help indicates that they create templates from multiple tagged photos; if a user removes a tag from one of their photo, that image is not used in the template.

    Facebook subsequently makes tag suggestions when a member views photos of their friends. They explain the process thus:

    • "We are able to suggest that your friend tag you in a picture by scanning and comparing your friend‘s pictures to information we've put together from the other photos you've been tagged in".

    So we see that Facebook must be more or less continuously checking images from members' photo albums against its store of facial recognition templates. When a match is detected, a tag suggestion is generated and logged, ready to be displayed next time the member is online.

    What concerns us is that the proactive creation of biometric matches constitutes a new type of PI Collection, for Facebook must be attaching names -- even tentatively, as metadata -- to photos. This is a covert and indirect process.

    Photos of anonymous strangers are not Personal Information, but metadata that identifies people in those photos most certainly is. Thus facial recognition is converting hitherto anonymous data -- uploaded in the past for personal reasons unrelated to photo tagging let alone covert identification -- into Personal Information.

    Facebook limits the ability to tag photos to members who are friends of the target. This is purportedly a privacy enhancing feature, but unfortunately Facebook has nothing in its Data Use Policy to limit the use of the biometric data compiled through tagging. Restricting tagging to friends is likely to actually benefit Facebook for it reduces the number of specious or mischievous tags, and it probably enhances accuracy by having faces identified only by those who know the individuals.

    A fundamental clash with the Collection Limitation Principle

    In Australian privacy law, as with the OECD framework, the first and foremost privacy principle concerns Collection. Australia's National Privacy Principle NPP 1 requires that an organisation refrain from collecting Personal Information unless (a) there is a clear need to collect that information; (b) the collection is done by fair means, and (c) the individual concerned is made aware of the collection and the reasons for it.

    In accordance with the Collection Principle (and others besides), a conventional privacy notice and/or privacy policy must give a full account of what Personal Information an organisation collects (including that which it creates internally) and for what purposes. And herein lies a fundamental challenge for most online social networks.

    The core business model of many Online Social Networks is to take advantage of Personal Information, in many and varied ways. From the outset, Facebook founder, Mark Zuckerberg, appears to have been enthusiastic for information built up in his system to be used by others. In 2004, he told a colleague "if you ever need info about anyone at Harvard, just ask" (as reported by Business Insider). Since then, Facebook has experienced a string of privacy controversies, including the "Beacon" sharing feature in 2007, which automatically imported members' activities on external websites and re-posted the information on Facebook for others to see.

    Facebook's privacy missteps are characterised by the company using the data it collects in unforeseen and barely disclosed ways. Yet this is surely what Facebook's investors expect the company to be doing: innovating in the commercial exploitation of personal information. The company's huge market valuation derives from a widespread faith in the business community that Facebook will eventually generate huge revenues. An inherent clash with privacy arises from the fact that Facebook is a pure play information company: its only significant asset is the information it holds about its members. There is a market expectation that this asset will be monetized and maximised. Logically, anything that checks the network's flux in Personal Information -- such as the restraints inherent in privacy protection, whether adopted from within or imposed from without -- must affect the company's futures.

    Conclusion

    Perhaps the toughest privacy dilemma for innovation in commercial Online Social Networking is that these businesses still don't know how they are going to make money from their Personal Information lode. Even if they wanted to, they cannot tell what use they will eventually make of it, and so a fundamental clash with the Collection Limitation Principle remains.

    Acknowledgements

    An earlier version of this article was originally published by LexisNexis in the Privacy Law Bulletin (2010).

    References

    • Greenleaf G., "Global Data Privacy Laws: 89 Countries, and Accelerating", Privacy Laws & Business International Report, Issue 115, Special Supplement, February 2012 Queen Mary School of Law Legal Studies Research Paper No. 98/2012
    • Moubarak G., Guiot A. et al "Facebook activity of residents and fellows and its impact on the doctor--patient relationship" J Med Ethics, 15 December 2010

    Posted in Social Networking, Social Media, Privacy