Lockstep

Mobile: +61 (0) 414 488 851
Email: swilson@lockstep.com.au

The strengths and weaknesses of Data Privacy in the Age of Big Data

This is the abstract of a current privacy conference proposal.

Synopsis

Many Big Data and online businesses proceed on a naive assumption that data in the "public domain" is up for grabs; technocrats are often surprised that conventional data protection laws can be interpreted to cover the extraction of PII from raw data. On the other hand, orthodox privacy frameworks don't cater for the way PII can be created in future from raw data collected today. This presentation will bridge the conceptual gap between data analytics and privacy, and offer new dynamic consent models to civilize the trade in PII for goods and services.

Abstract

It’s often said that technology has outpaced privacy law, yet by and large that's just not the case. Technology has certainly outpaced decency, with Big Data and biometrics in particular becoming increasingly invasive. However OECD data privacy principles set out over thirty years ago still serve us well. Outside the US, rights-based privacy law has proven effective against today's technocrats' most worrying business practices, based as they are on taking liberties with any data that comes their way. To borrow from Niels Bohr, technologists who are not surprised by data privacy have probably not understood it.

The cornerstone of data privacy in most places is the Collection Limitation principle, which holds that organizations should not collect Personally Identifiable Information beyond their express needs. It is the conceptual cousin of security's core Need-to-Know Principle, and the best starting point for Privacy-by-Design. The Collection Limitation principle is technology neutral and thus blind to the manner of collection. Whether PII is collected directly by questionnaire or indirectly via biometric facial recognition or data mining, data privacy laws apply.

It’s not for nothing we refer to "data mining". But few of those unlicensed data gold diggers seem to understand that the synthesis of fresh PII from raw data (including the identification of anonymous records like photos) is merely another form of collection. The real challenge in Big Data is that we don’t know what we’ll find as we refine the techniques. With the best will in the world, it is hard to disclose in a conventional Privacy Policy what PII might be collected through synthesis down the track. The age of Big Data demands a new privacy compact between organisations and individuals. High minded organizations will promise to keep people abreast of new collections and will offer ways to opt in, and out and in again, as the PII-for-service bargain continues to evolve.

Posted in Social Networking, Privacy, Biometrics, Big Data

The Snapchat data breach

Yesterday it was reported by The Verge that anonymous hackers have accessed Snapchat's user database and posted 4.6 million user names and phone numbers. In an apparent effort to soften the blow, two digits of the phone numbers were redacted. So we might assume this is a "white hat" exercise, designed to shame Snapchat into improving their security. Indeed, a few days ago Snapchat themselves said they had been warned of vulnerabilities in their APIs that would allow a mass upload of user records.

The response of many has been, well, so what? Some people have casually likened Snapchat's list to a public White Pages; others have played it down as "just email addresses".

Let's look more closely. The leaked list was not in fact public names and phone numbers; it was user names and phone numbers. User names might often be email addresses but these are typically aliases; people frequently choose email addresses that reveal little or nothing of their real world identity. We should assume there is intent in an obscure email address for the individual to remain secret.

Identity theft has become a highly organised criminal enterprise. Crime gangs patiently acquire multiple data sets over many months, sometimes years, gradually piecing together detailed personal profiles. It's been shown time and time again by privacy researchers (perhaps most notably Latanya Sweeney) that re-identification is enabled by linking diverse data sets. And for this purpose, email addresses and phone numbers are superbly valuable indices for correlating an individual's various records. Your email address is common across most of your social media registrations. And your phone number allows your real name and street address to be looked up from reverse White Pages. So the Snapchat breach could be used to join aliases or email addresses to real names and addresses via the phone numbers. For a social engineering attack on a call centre -- or even to open a new bank account -- an identity thief can go an awful long way with real name, street address, email address and phone number.

I was asked in an interview to compare the theft of stolen phone numbers with social security numbers. I surprised the interviewer when I said phone numbers are probably even more valuable to the highly organised ID thief, for they can be used to index names in public directories, and to link different data sets, in ways that SSNs (or credit card numbers for that matter) cannot.

So let us start to treat all personal inormation -- especially when aggregated in bulk -- more seriously! And let's be more cautious in the way we categorise personal or Personally Identifiable Information (PII).

Importantly, most regulatory definitions of PII already embody the proper degree of caution. Look carefully at the US government definition of Personally Identifiable Information:

  • information that can be used to distinguish or trace an individual’s identity, either alone or when combined with other personal or identifying information that is linked or linkable to a specific individual (underline added).

This means that items of data can constitute PII if other data can be combined to identify the person concerned. That is, the fragments are regarded as PII even if it is the whole that does the identifying.

And remember that the middle I in PII stands for Identifiable, and not, as many people presume, Identifying. To meet the definition of PII, data need not uniquely identify a person, it merely needs to be directly or indirectly identifiable with a person. And this is how it should be when we heed the way information technologies enable identification through linkages.

Almost anywhere else in the world, data stores like Snapchat's would automatically fall under data protection and information privacy laws; regulators would take a close look at whether the company had complied with the OECD Privacy Principles, and whether Snapchat's security measures were fit for purpose given the PII concerned. But in the USA, companies and commentators alike still have trouble working out how serious these breaches are. Each new breach is treated in an ad hoc manner, often with people finessing the difference between credit card numbers -- as in the recent Target breach -- and "mere" email addresses like those in the Snapchat and Epsilon episodes.

Surely the time has come to simply give proper regulatory protection to all PII.

Posted in Social Networking, Social Media, Security, Privacy, Identity, Fraud, Big Data

Facebook's challenge to the Collection Limitation Principle

Facebook's challenge to the Collection Limitation Principle

An extract from our chapter in the forthcoming Encyclopedia of Social Network Analysis and Mining (to be published by Springer in 2014).

Stephen Wilson, Lockstep Consulting, Sydney, Australia.
Anna Johnston, Salinger Privacy, Sydney, Australia.

Key Points

  • Facebook's business practices pose a risk of non-compliance with the Collection Limitation Principle (OECD Privacy Principle No. 1, and corresponding Australian National Privacy Principles NPP 1.1 through 1.4).
  • Privacy problems will likely remain while Facebook's business model remains unsettled, for the business is largely based on collecting and creating as much Personal Information as it can, for subsequent and as yet unspecified monetization.
  • If an OSN business doesn't know how it is eventually going to make money from Personal Information, then it has a fundamental difficulty with the Collection Limitation principle.

Introduction

Facebook is an Internet and societal phenomenon. Launched in 2004, in just a few years it has claimed a significant proportion of the world's population as regular users, becoming by far the most dominant Online Social Network (OSN). With its success has come a good deal of controversy, especially over privacy. Does Facebook herald a true shift in privacy values? Or, despite occasional reckless revelations, are most users no more promiscuous than they were eight years ago? We argue it's too early to draw conclusions about society as a whole from the OSN experience to date. In fact, under laws that currently stand, many OSNs face a number of compliance risks in dozens of jurisdictions.

Over 80 countries worldwide now have enacted data privacy laws, around half of which are based on privacy principles articulated by the OECD. Amongst these are the Collection Limitation Principle which requires businesses to not gather more Personal Information than they need for the tasks at hand, and the Use Limitation Principle which dictates that Personal Information collected for one purpose not be arbitrarily used for others without consent.
Overt collection, covert collection (including generation) and "innovative" secondary use of Personal Information are the lifeblood of Facebook. While Facebook's founder would have us believe that social mores have changed, a clash with orthodox data privacy laws creates challenges for the OSN business model in general.

This article examines a number of areas of privacy compliance risk for Facebook. We focus on how Facebook collects Personal Information indirectly, through the import of members' email address books for "finding friends", and by photo tagging. Taking Australia's National Privacy Principles from the Privacy Act 1988 (Cth) as our guide, we identify a number of potential breaches of privacy law, and issues that may be generalised across all OECD-based privacy environments.

Terminology

Australian law tends to use the term "Personal Information" rather than "Personally Identifiable Information" although they are essentially synonymous for our purposes.

Terms of reference: OECD Privacy Principles and Australian law

The Organisation for Economic Cooperation and Development has articulated eight privacy principles for helping to protect personal information. The OECD Privacy Principles are as follows:

  • 1. Collection Limitation Principle
  • 2. Data Quality Principle
  • 3. Purpose Specification Principle
  • 4. Use Limitation Principle
  • 5. Security Safeguards Principle
  • 6. Openness Principle
  • 7. Individual Participation Principle
  • 8. Accountability Principle

Of most interest to us here are principles one and four:

  • Collection Limitation Principle: There should be limits to the collection of personal data and any such data should be obtained by lawful and fair means and, where appropriate, with the knowledge or consent of the data subject.
  • Use Limitation Principle: Personal data should not be disclosed, made available or otherwise used for purposes other than those specified in accordance with [the Purpose Specification] except with the consent of the data subject, or by the authority of law.

At least 89 counties have some sort of data protection legislation in place [Greenleaf, 2012]. Of these, in excess of 30 jurisdictions have derived their particular privacy regulations from the OECD principles. One example is Australia.

We will use Australia's National Privacy Principles NPPs in the Privacy Act 1988 as our terms of reference for analysing some of Facebook's systemic privacy issues. In Australia, Personal Information is defined as: information or an opinion (including information or an opinion forming part of a database), whether true or not, and whether recorded in a material form or not, about an individual whose identity is apparent, or can reasonably be ascertained, from the information or opinion.

Indirect collection of contacts

One of the most significant collections of Personal Information by Facebook is surely the email address book of those members that elect to have the site help "find friends". This facility provides Facebook with a copy of all contacts from the address book of the member's nominated email account. It's the very first thing that a new user is invited to do when they register. Facebook refer to this as "contact import" in the Data Use Policy (accessed 10 August 2012).

"Find friends" is curtly described as "Search your email for friends already on Facebook". A link labelled "Learn more" in fine print leads to the following additional explanation:

  • "Facebook won't share the email addresses you import with anyone, but we will store them on your behalf and may use them later to help others search for people or to generate friend suggestions for you and others. Depending on your email provider, addresses from your contacts list and mail folders may be imported. You should only import contacts from accounts you've set up for personal use." [underline added by us].

Without any further elaboration, new users are invited to enter their email address and password if they have a cloud based email account (such as Hotmail, gmail, Yahoo and the like). These types of services have an API through which any third party application can programmatically access the account, after presenting the user name and password.

It is entirely possible that casual users will not fully comprehend what is happening when they opt in to have Facebook "find friends". Further, there is no indication that, by default, imported contact details are shared with everyone. The underlined text in the passage quoted above shows Facebook reserves the right to use imported contacts to make direct approaches to people who might not even be members.

Importing contacts represents an indirect collection by Facebook of Personal Information of others, without their authorisation or even knowledge. The short explanatory information quoted above is not provided to the individuals whose details are imported and therefore does not constitute a Collection Notice. Furthermore, it leaves the door open for Facebook to use imported contacts for other, unspecified purposes. The Data Use Policy imposes no limitations as to how Facebook may make use of imported contacts.

Privacy harms are possible in social networking if members blur the distinction between work and private lives. Recent research has pointed to the risky use of Facebook by young doctors, involving inappropriate discussion of patients [Moubarak et al, 2010]. Even if doctors are discreet in their online chat, we are concerned that they may run foul of the Find Friends feature exposing their connections to named patients. Doctors on Facebook who happen to have patients in their web mail address books can have associations between individuals and their doctors become public. In mental health, sexual health, family planning, substance abuse and similar sensitive fields, naming patients could be catastrophic for them.

While most healthcare professionals may use a specific workplace email account which would not be amenable to contacts import, many allied health professionals, counselors, specialists and the like run their sole practices as small businesses, and naturally some will use low cost or free cloud-based email services. Note that the substance of a doctor's communications with their patients over web mail is not at issue here. The problem of exposing associations between patients and doctors arises simply from the presence of a name in an address book, even if the email was only ever used for non-clinical purposes such as appointments or marketing.

Photo tagging and biometric facial recognition

One of Facebook's most "innovative" forms of Personal Information Collection would have to be photo tagging and the creation of biometric facial recognition templates.

Photo tagging and "face matching" has been available in social media for some years now. On photo sharing sites such as Picasa, this technology "lets you organize your photos according to the people in them" in the words of the Picasa help pages. But in more complicated OSN settings, biometrics has enormous potential to both enhance the services on offer and to breach privacy.

In thinking about facial recognition, we start once more with the Collection Principle. Importantly, nothing in the Australian Privacy Act circumscribes the manner of collection; no matter how a data custodian comes to be in possession of Personal Information (being essentially any data about a person whose identity is apparent) they may be deemed to have collected it. When one Facebook member tags another in a photo on the site, then the result is that Facebook has overtly but indirectly collected PI about the tagged person.

Facial recognition technologies are deployed within Facebook to allow its servers to automatically make tag suggestions; in our view this process constitutes a new type of Personal Information Collection, on a potentially vast scale.

Biometric facial recognition works by processing image data to extract certain distinguishing features (like the separation of the eyes, nose, ears and so on) and computing a numerical data set known as a template that is highly specific to the face, though not necessarily unique. Facebook's online help indicates that they create templates from multiple tagged photos; if a user removes a tag from one of their photo, that image is not used in the template.

Facebook subsequently makes tag suggestions when a member views photos of their friends. They explain the process thus:

  • "We are able to suggest that your friend tag you in a picture by scanning and comparing your friend‘s pictures to information we've put together from the other photos you've been tagged in".

So we see that Facebook must be more or less continuously checking images from members' photo albums against its store of facial recognition templates. When a match is detected, a tag suggestion is generated and logged, ready to be displayed next time the member is online.

What concerns us is that the proactive creation of biometric matches constitutes a new type of PI Collection, for Facebook must be attaching names -- even tentatively, as metadata -- to photos. This is a covert and indirect process.

Photos of anonymous strangers are not Personal Information, but metadata that identifies people in those photos most certainly is. Thus facial recognition is converting hitherto anonymous data -- uploaded in the past for personal reasons unrelated to photo tagging let alone covert identification -- into Personal Information.

Facebook limits the ability to tag photos to members who are friends of the target. This is purportedly a privacy enhancing feature, but unfortunately Facebook has nothing in its Data Use Policy to limit the use of the biometric data compiled through tagging. Restricting tagging to friends is likely to actually benefit Facebook for it reduces the number of specious or mischievous tags, and it probably enhances accuracy by having faces identified only by those who know the individuals.

A fundamental clash with the Collection Limitation Principle

In Australian privacy law, as with the OECD framework, the first and foremost privacy principle concerns Collection. Australia's National Privacy Principle NPP 1 requires that an organisation refrain from collecting Personal Information unless (a) there is a clear need to collect that information; (b) the collection is done by fair means, and (c) the individual concerned is made aware of the collection and the reasons for it.

In accordance with the Collection Principle (and others besides), a conventional privacy notice and/or privacy policy must give a full account of what Personal Information an organisation collects (including that which it creates internally) and for what purposes. And herein lies a fundamental challenge for most online social networks.

The core business model of many Online Social Networks is to take advantage of Personal Information, in many and varied ways. From the outset, Facebook founder, Mark Zuckerberg, appears to have been enthusiastic for information built up in his system to be used by others. In 2004, he told a colleague "if you ever need info about anyone at Harvard, just ask" (as reported by Business Insider). Since then, Facebook has experienced a string of privacy controversies, including the "Beacon" sharing feature in 2007, which automatically imported members' activities on external websites and re-posted the information on Facebook for others to see.

Facebook's privacy missteps are characterised by the company using the data it collects in unforeseen and barely disclosed ways. Yet this is surely what Facebook's investors expect the company to be doing: innovating in the commercial exploitation of personal information. The company's huge market valuation derives from a widespread faith in the business community that Facebook will eventually generate huge revenues. An inherent clash with privacy arises from the fact that Facebook is a pure play information company: its only significant asset is the information it holds about its members. There is a market expectation that this asset will be monetized and maximised. Logically, anything that checks the network's flux in Personal Information -- such as the restraints inherent in privacy protection, whether adopted from within or imposed from without -- must affect the company's futures.

Conclusion

Perhaps the toughest privacy dilemma for innovation in commercial Online Social Networking is that these businesses still don't know how they are going to make money from their Personal Information lode. Even if they wanted to, they cannot tell what use they will eventually make of it, and so a fundamental clash with the Collection Limitation Principle remains.

Acknowledgements

An earlier version of this article was originally published by LexisNexis in the Privacy Law Bulletin (2010).

References

  • Greenleaf G., "Global Data Privacy Laws: 89 Countries, and Accelerating", Privacy Laws & Business International Report, Issue 115, Special Supplement, February 2012 Queen Mary School of Law Legal Studies Research Paper No. 98/2012
  • Moubarak G., Guiot A. et al "Facebook activity of residents and fellows and its impact on the doctor--patient relationship" J Med Ethics, 15 December 2010

Posted in Social Networking, Social Media, Privacy

What's really happening to privacy?

The cover of Newsweek magazine on 27 July 1970 featured a cartoon couple cowered by computer and communications technology, and the urgent all-caps headline “IS PRIVACY DEAD?”

Is Privacy Dead Newsweek

Four decades on, Newsweek is dead, but we’re still asking the same question.

Every generation or so, our notions of privacy are challenged by a new technology. In the 1880s (when Warren and Brandeis developed the first privacy jurisprudence) it was photography and telegraphy; in the 1970s it was computing and consumer electronics. And now it’s the Internet, a revolution that has virtually everyone connected to everyone else (and soon everything) everywhere, and all of the time. Some of the world’s biggest corporations now operate with just one asset – information – and a vigorous “publicness” movement rallies around the purported liberation of shedding what are said by writers like Jeff Jarvis (in his 2011 book “Public Parts”) to be old fashioned inhibitions. Online Social Networking, e-health, crowd sourcing and new digital economies appear to have shifted some of our societal fundamentals.

However the past decade has seen a dramatic expansion of countries legislating data protection laws, in response to citizens’ insistence that their privacy is as precious as ever. And consumerized cryptography promises absolute secrecy. Privacy has long stood in opposition to the march of invasive technology: it is the classical immovable object met by an irresistible force.

So how robust is privacy? And will the latest technological revolution finally change privacy forever?

Soaking in information

We live in a connected world. Young people today may have grown tired of hearing what a difference the Internet has made, but a crucial question is whether relatively new networking technologies and sheer connectedness are exerting novel stresses to which social structures have yet to adapt. If “knowledge is power” then the availability of information probably makes individuals today more powerful than at any time in history. Search, maps, Wikipedia, Online Social Networks and 3G are taken for granted. Unlimited deep technical knowledge is available in chat rooms; universities are providing a full gamut of free training via Massive Open Online Courses (MOOCs). The Internet empowers many to organise in ways that are unprecedented, for political, social or business ends. Entirely new business models have emerged in the past decade, and there are indications that political models are changing too.

Most mainstream observers still tend to talk about the “digital” economy but many think the time has come to drop the qualifier. Important services and products are, of course, becoming inherently digital and whole business categories such as travel, newspapers, music, photography and video have been massively disrupted. In general, information is the lifeblood of most businesses. There are countless technology-billionaires whose fortunes are have been made in industries that did not exist twenty or thirty years ago. Moreover, some of these businesses only have one asset: information.

Banks and payments systems are getting in on the action, innovating at a hectic pace to keep up with financial services development. There is a bewildering array of new alternative currencies like Linden dollars, Facebook Credits and Bitcoins – all of which can be traded for “real” (reserve bank-backed) money in a number of exchanges of varying reputation. At one time it was possible for Entropia Universe gamers to withdraw dollars at ATMs against their virtual bank balances.

New ways to access finance have arisen, such as peer-to-peer lending and crowd funding. Several so-called direct banks in Australia exist without any branch infrastructure. Financial institutions worldwide are desperate to keep up, launching amongst other things virtual branches and services inside Online Social Networks (OSNs) and even virtual worlds. Banks are of course keen to not have too many sales conducted outside the traditional payments system where they make their fees. Even more strategically, banks want to control not just the money but the way the money flows, because it has dawned on them that information about how people spend might be even more valuable than what they spend.

Privacy in an open world

For many for us, on a personal level, real life is a dynamic blend of online and physical experiences. The distinction between digital relationships and flesh-and-blood ones seems increasingly arbitrary; in fact we probably need new words to describe online and offline interactions more subtly, without implying a dichotomy.

Today’s privacy challenges are about more than digital technology: they really stem from the way the world has opened up. The enthusiasm of many for such openness – especially in Online Social Networking – has been taken by some commentators as a sign of deep changes in privacy attitudes. Facebook's Mark Zuckerberg for instance said in 2010 that “People have really gotten comfortable not only sharing more information and different kinds, but more openly and with more people - and that social norm is just something that has evolved over time”. And yet serious academic investigation of the Internet’s impact on society is (inevitably) still in its infancy. Social norms are constantly evolving but it’s too early to tell to if they have reached a new and more permissive steady state. The views of information magnates in this regard should be discounted given their vested interest in their users' promiscuity.

At some level, privacy is about being closed. And curiously for a fundamental human right, the desire to close off parts of our lives is relatively fresh. Arguably it’s even something of a “first world problem”. Formalised privacy appears to be an urban phenomenon, unknown as such to people in villages when everyone knew everyone – and their business. It was only when large numbers of people congregated in cities that they became concerned with privacy. For then they felt the need to structure the way they related to large numbers of people – family, friends, work mates, merchants, professionals and strangers – in multi-layered relationships. So privacy was borne of the first industrial revolution. It has taken prosperity and active public interest to create the elaborate mechanisms that protect our personal privacy from day to day and which we take for granted today: the postal services, direct dial telephones, telecommunications regulations, individual bedrooms in large houses, cars in which we can escape or a while, and now of course the mobile handset.

In control

Privacy is about respect and control. Simply put, if someone knows me, then they should respect what they know; they should exercise restraint in how they use that knowledge, and be guided by my wishes. Generally, privacy is not about anonymity or secrecy. Of course, if we live life underground then unqualified privacy can be achieved, yet most of us exist in diverse communities where we actually want others to know a great deal about us. We want merchants to know our shipping address and payment details, healthcare providers to know our intimate details, hotels to know our travel plans and so on. Practical privacy means that personal information is not shared arbitrarily, and that individuals retain control over the tracks of their lives.

Big Data: Big Future

Big Data tools are being applied everywhere, from sifting telephone call records to spot crimes in the planning, to DNA and medical research. Every day, retailers use sophisticated data analytics to mine customer data, ostensibly to better uncover true buyer sentiments and continuously improve their offerings. Some department stores are interested in predicting such major life changing events as moving house or falling pregnant, because then they can target whole categories of products to their loyal customers.

Real time Big Data will become embedded in our daily lives, through several synchronous developments. Firstly computing power, storage capacity and high speed Internet connectivity all continue to improve at exponential rates. Secondly, there are more and more “signals” for data miners to choose from. No longer do you have to consciously tell your OSN what you like or what you’re doing, because new augmented reality devices are automatically collecting audio, video and locational data, and trading it around a complex web of digital service providers. And miniaturisation is leading to a whole range of smart appliances, smart cars and even smart clothes with built-in or ubiquitous computing.

The privacy risks are obvious, and yet the benefits are huge. So how should we think about the balance in order to optimise the outcome? Let’s remember that information powers the new digital economy, and the business models of many major new brands like Facebook, Twitter, Four Square and Google incorporate a bargain for Personal Information. We obtain fantastic services from these businesses “for free” but in reality they are enabled by all that information we give out as we search, browse, like, friend, tag, tweet and buy.

The more innovation we see ahead, the more certain it seems that data will be the core asset of cyber enterprises. To retain and even improve our privacy in the unfolding digital world, we must be able to visualise the data flows that we’re engaged in, evaluate what we get in return for our information, and determine a reasonable trade of costs and benefits

Is Privacy Dead? If the same rhetorical question needs to be asked over and over for decades, then it’s likely the answer is no.

Posted in Social Networking, Privacy, Internet, Culture, Big Data

Classic Facebook stalking horse

Yesterday Instagram made its first move towards delivering the real value in its acquisition by Facebook. They revised their Privacy Policy and Terms of Use to allow greater sharing of photos with Facebook and other businesses, especially advertisers. Instagram posted a new set of Terms on Monday, the shit hit the fan, and today they back-peddled.

The mea culpa is a classic, straight out of the Zuckerberg copybook. They say they were misunderstood. They say they don't want to sell photos to ad men. They say members will always own their photos. But ownership is a red herring and the whole exercise is likely a stalking horse, designed to distract people from more significant issues around metadata and Facebook's ever deepening ability to infer PII.

Firstly, let's be clear that greater sharing follows the acquisition as night follows day. I noted at the time that the only way to understand Facebook's billion dollar spend on Instagram is around the value to be mined from the mother lode of photo data. In particular, image analysis and facial recognition grant Instagram and Facebook x-ray vision into their members' daily lives. They can work out what people are doing, with whom they're doing it, when and where. With these tools, they're moving quickly from collecting Personally Identifiable Information when it is volunteered by users, to PII that is observed and inferred. The quality and quantity of the PII flux is driven up dramatically. No longer is the lifeblood of Facebook -- the insights they have on 15% of the world's population -- filtered by what users elect to post and Like and tag, but now that information is raw, unexpurgated and automated.

Now ask where the money in photo data is to be made. It's not in selling candid snapshots of folks enjoying branded products. It's in the intelligence that image data yield about how people lead their lives. This intelligence is Facebook's one and only asset.

So it is metadata that we need to worry about. In its initial update to the Terms, Instagram said this: [You] agree that a business or other entity may pay us to display your username, likeness, photos (along with any associated metadata), and/or actions you take, in connection with paid or sponsored content or promotions, without any compensation to you.. In over 6,000 words "metadata" is mentioned just twice, parenthetically, and without any definition. Metadata is figuring more and more in the privacy discourse, and that's great, but we need to look beyond the usual stuff like geolocation and camera type embedded in the JPEGs. Much more important now is the latent identifiable personal content in images. Image analysis and image search provide endless new possibilities for infomopolies to extract value from photos.

A great deal of this week's outcry has focused on things like the lack of compensation, and all of Instagram's apology today is around the ownership of photos. But ownership is moot if they reserve their right to use and disclose metadata in any way they like. What actually matters is the individual's ability to understand and control what is done with any PII about them, including metadata. When the German privacy regulator acted against Facebook's facial recognition practices earlier this year, the principle they applied from OECD style legislation is that there are limits to what can be collected about individuals without their consent. The regulator ruled it unlawful for Facebook to extract biometric information from images when their users innocently think they're only tagging people in photos.

So when I read Instagram's excuse, I don't see any truly meaningful self-restraint in the way they can exploit image data. Their switch is not even a tactical retreat, for as yet, they're not giving anything up.

Posted in Social Networking, Privacy, Big Data

Don't mix business and pleasure

At the recent Gartner Identity & Access Summit, analyst Earl Perkins spoke of the potential for Facebook to be used as an enterprise IdP. I'd like to see these sorts of speculations dampened a little by filtering them through the understanding that identity is a proxy for relationship.

Here's the practical difficulty that shows why we must reframe what we're talking about. If Facebook were to be an Identity Issuer, they would have to be clear about what enterprises really need to know about their staff, customers, partners and so on. There is no standardised answer to that; every business gets to know its people in their own peculiar ways. Does Facebook with its x-ray vision into our personal lives have anything to offer enterprises? If we work out which assertions might be vouched for by Facebook, how would they be underwritten exactly?

And I really mean exactly because liability is what kills off most identity federations. The idea of re-using identity across contexts is easier said than done. Banks have tried and tried again to federate identities amongst themselves. The Australian experience (of Trust Centre and MAMBO) was that banks find it too complex to re-use each others' issued IDs because of the legal complexity, even when they're all operating under the same laws and regulations! So how on earth will business make the jump to using Facebook as an IdP when they have yet to figure out banks as IdP?

I'd surely like to hear from Facebook themselves about how they see their IdP business developing. They're being very coy about even the early forays like Facedeals, which is using biometric data from Facebook to check people into stores by facial recognition. It's a pretty serious app, with very serious privacy ramifications, amplified by the fact that German regulators have thrown the book at Facebook for being underhanded with photo tagging. Under the circumstances, I would have expected Facedeals to have a Privacy Policy, and Facebook to make some public announcements about how they support the third party consumption of their biometric templates. But no, neither has happened.

The old saw don't "Mix Business And Pleasure" turns out to predict the cyber world challenges of bringing social identities and business identities together. I have concluded that identity is metaphorical. Each identity is really a proxy for a relationship, and most of our intuitions about identity need to be reframed in terms of relationships. We're not talking simply about names! The types of relationship we entertain socially (and are free to curate for ourselves) may be fundamentally irreconcilable with the identities provided to us by businesses as a way to manage their risks, as is their prerogative.

Posted in Social Networking, Identity, Federated Identity

If Facebook were honest

The first and foremost privacy principle in any data protection regime is Collection Limitation. A classic instance is Australia's National Privacy Principle NPP 1, which requires that an organisation refrain from collecting Personal Information unless (a) there is a clear need to collect that information; (b) the collection is done by fair means, and (c) the individual concerned is made aware of the collection and the reasons for it.

In accordance with the Collection Principle (and others besides), a conventional privacy notice or privacy policy should give a full account of what Personal Information an organisation collects (including that which it creates internally) and why it collects it.

And herein lies a fundamental challenge for most online social networks: if they were honest about the Collection Principle, they would have to say "We collect information about you to make money".

The core business model of many Online Social Networks is to exploit Personal Information, in many and varied ways. There's a bargain for Personal Information inherent in commercial social media. Some say the bargain is obvious to today's savvy netizens; it's said that everybody knows there is no such thing as a free lunch. But I am not so sure. I doubt that the average Facebook user really grasps what's going on. The bargain for their information is opaque and unfair.

From the outset, Facebook founder Mark Zuckerberg was tellingly enthusiastic for information built up in his system to be used by others. In 2004, he told a colleague "if you ever need info about anyone at Harvard, just ask".

Facebook has experienced a more or less continuous string of privacy controversies, including the "Beacon" sharing feature in 2007, which automatically imported members' activities on external websites and re-posted the information on Facebook for others to see. Facebook's privacy missteps almost always relate to the company using the data it collects in unforeseen and barely disclosed ways. Yet this is surely what Facebook's investors expect the company to be doing: innovating in the commercial exploitation of personal information. An inherent clash with privacy arises from the fact that Facebook is a pure play information company: its only significant asset is the information it holds about its members. The market expects this asset to be monetised and maximised. Logically, anything that checks the network's flux in Personal Information -- such as the restraints inherent in privacy protection, whether adopted from within or imposed from without -- must affect the company's futures.

Facebook's business model is enhanced by promiscuity amongst its members, so there is an apparent conflict of interest in the firm's privacy posture. The more information its members are willing to divulge, the greater is Facebook's value. Zuckerberg is far from a passive bystander in this; he has long tried to train his members to abandon privacy norms, in order to generate ever more information flux upon which the site depends. He is brazenly quick to judge what he sees as broader societal shifts. Interviewed at the 2010 TechCrunch conference, he said:

[In] the last five or six years, blogging has taken off in a huge way and all these different services that have people sharing all this information. People have really gotten comfortable not only sharing more information and different kinds, but more openly and with more people. That social norm is just something that has evolved over time. We view it as our role in the system to constantly be innovating and be updating what our system is to reflect what the current social norms are.

It is rather too early to draw this sort of sweeping generalisation from the behaviours of a specially self-selected cohort of socially hyperactive users. Without underestimating the empirical importance of Facebook to hundreds of millions of people, surely one of the over-riding characteristics of OSN as a pastime is simply that it is fun. There is a sort of suspension of disbelief at work when people act in this digital world, divorced from normal social cues which may lead them to lower their guard. Facebook users are not fully briefed on the consequences of their actions, and so their behaviour to some extent is being directed by the site designers; it has not evolved naturally as Zuckerberg would have us believe.

Yet promiscuity is not in fact the source of the most valuable social data. Facebook has a particularly sorry history of hiding its most effective collection methods from view. Facial recognition is perhaps the best example. While it has offered photo tagging for years, it was only in early 2012 that Facebook started to talk plainly about how it constructs biometric templates from tags, and how it runs those templates over stored photo data to come up with tag suggestions. Meanwhile, the application of facial recognition is quietly expanding beyond what they reveal, with the likes of Facedeals for example starting to leverage Facebook's templates, in ways that are not disclosed in any Privacy Policies anywhere.

Privacy is largely about transparency. Businesses owe it to their members and customers to honestly disclose what data is collected and why. While social networks continue to obfuscate the true exchange of Personal Information for commercial value, we cannot take seriously their claims to respect our privacy.

Posted in Social Networking, Privacy

It's not too late for privacy

Have you heard the news? "Privacy is dead!"

It's an urgent, impatient sort of line in the sand, drawn by the new masters of the universe digital, as a challenge to everyone else. C'mon, get with the program! Innovate! Don't be so precious - so very 20th century! Don't you dig that Information Wants To Be Free? Clearly, old fashioned privacy is holding us back!

The stark choice posited between privacy and digital liberation is rarely examined with much diligence; often it's actually a fatalistic response to the latest breach or the latest eye popping digital development. In fact, those who earnestly assert that privacy is dead are almost always trying to sell us something, be it a political ideology, or a social networking prospectus, or sneakers targeted at an ultra-connected, geolocated, behaviorally qualified nano market segment.

Is it really too late for privacy? Is the genie out of the bottle? Even if we accepted the ridiculous premise that privacy is at odds with progress, no it's not too late, firstly because the pessimism (or commercial opportunism) generally confuses secrecy for privacy, and secondly because frankly, we aint seen nothin yet!

Conflating privacy and secrecy

Technology certainly has laid us bare. Behavioural modeling, facial recognition, Big Data mining, natural language processing and so on have given corporations x-ray vision into our digital lives. While exhibitionism has been cultivated and normalised by the infomopolists, even the most guarded social network users may be defiled by Big Data wizards who without consent upload their contact lists, pore over their photo albums, and mine their shopping histories, as is their wanton business model.

So yes, a great deal about us has leaked out into what some see as an extended public domain. And yet we can be public and retain our privacy at the same time.

Some people seem defeated by privacy's definitional difficulties, yet information privacy is simply framed, and corresponding data protection laws readily understood. Information privacy is basically a state where those who know us are restrained in what they can do with the knowledge they have about us. Privacy is about respect, and protecting individuals against exploitation. It is not about secrecy or even anonymity. There are few cases where ordinary people really want to be anonymous. We actually want businesses to know -- within limits -- who we are, where we are, what we've done, what we like, but we want them to respect what they know, to not share it with others, and to not take advantage of it in unexpected ways. Privacy means that organisations behave as though it's a privilege to know us.

Many have come to see privacy as literally a battleground. The grassroots Cryptoparty movement has come together around a belief that privacy means hiding from the establishment. Cryptoparties teach participants how to use Tor and PGP, and spread a message of resistance. They take inspiration from the Arab Spring where encryption has of course been vital for the security of protestors and organisers. The one Cryptoparty I've attended so far in Sydney opened with tributes from Anonymous, and a number of recorded talks by activists who ranged across a spectrum of social and technosocial issues like censorship, copyright, national security and Occupy. I appreciate where they're coming from, for the establishment has always overplayed its security hand. Even traditionally moderate Western countries have governments charging like china shop bulls into web filtering and ISP data retention, all in the name of a poorly characterised terrorist threat. When governments show little sympathy for netizenship, and absolutely no understanding of how the web works, it's unsurprising that sections of society take up digital arms in response.

Yet going underground with encryption is a limited privacy stratagem, for DIY crypto is incompatible with the majority of our digital dealings. In fact the most nefarious, uncontrolled and ultimately the most dangerous privacy harms come from mainstream Internet businesses and not government. Assuming one still wants to shop online, use a credit card, tweet, and hang out on Facebook, we still need privacy protections. We need limitations on how our Personally Identifiable Information (PII) is used by all the services we deal with; we need department stores to refrain from extracting sensitive health information from our shopping habits, merchants to not use our credit card numbers as customer reference numbers, and online social networks to not x-ray our photo albums by biometric face recognition. I note that some Cryptoparty bookings are managed by the US event organiser Eventbrite, which has a detailed Privacy Policy setting out how it promises to handle personal information provided by attendees. It does seems reasonable to me, but like all private sector data protection arrangements, there's a lot going on there.

So ironically, when registering for a cryptoparty, you could not use encryption! For privacy, you have to either trust Eventbrite to have a reasonable policy and to stick to it, or you might rely on government regulations, if applicable. When registering, you give a little Personal Information to the organisers, and we expect that they will be restrained in what they do with it.

Going out in public never was a license for others to invade our privacy. We ought not to respond to online privacy invasions as if cyberspace is a new Wild West. We have always relied on regulatory systems of consumer protection to curb the excesses of business and government, and we should insist on the same in the digital age. We should not have to hide away if privacy is agreed to mean respecting the PII of customers, users and citizens, and restraining what data custodians do with that precious resource.

We aint seen nothin yet!

I ask anyone who thinks it's too late to reassert our privacy to think for a minute about where we're heading. We're still in the early days of the social web, and the information "innovators" have really only just begun. Look at what they've done so far:


  • Facial recognition converts vast stores of anonymous photos into PII, without consent, and without limit. Facebook's deployment of biometric technology was especially clever. For years they crowd-sourced the creation of templates and the calibration of their algorithms, without ever mentioning facial recognition in their privacy policy or help pages. Even now Facebook's Data Use Policy is entirely silent on biometric templates and what they allow themselves to do with them. Meanwhile, third party services like Facedeals are starting to use Facebook's photo resources for commercial facial recognition in public.
  • It's difficult to overstate the value of facial recognition to businesses like Facebook which have just one asset: the knowledge they have about their members. Combined with image analysis and content addressable image banks, facial recognition lets Facebook work out what we're doing, when, where and with whom, pirating billions of everyday images given over by members to a business that doesn't even mention these priceless resources in its privacy policy.

  • Big Data. The most notorious recent example of the power of data mining comes from Target's covert research into identifying customers who are pregnant based on their buying habits. Big Data practitioners are so enamoured with their ability to extract secrets from "public" data they seem blithely unaware that by generating fresh PII from their raw materials they are in fact collecting it as far as Information Privacy Law is concerned. As such, they’re legally liable for the privacy compliance of their cleverly synthesised data, just as if they had expressly gathered it all by questionnaire.

  • Natural Language Processing (NLP) is the secret sauce in Apple's Siri, allowing her to take commands -- and dictation. Every time you dictate an email or a text message to Siri, Apple gets hold of the content of telecommunications that are normally out of bounds to the phone companies. Siri is like a free PA that reports your daily activities back to the secretarial agency. There is no mention at all of Siri in Apple's Privacy Policy despite the limitless collection of intimate personal information.

As an aside, I'm not one of those who fret that technology has outstripped privacy law. Principles-based Information Prvacy law copes well with most of this technology. OECD privacy principles (enacted in over seventy countries) and the US FIPPs require that companies be transarent about what PII they collect and why, and that they limit the ways in which PII is used for unrelated purposes, and how it may be disclosed. These principles are decades old and yet they have been recently re-affirmed by German regulators recently over Facebook's surreptitious use of facial recognition. I expect that Siri will attract like scrutiny as it rolls out in continental Europe.

So what's next?


  • Google Glass may, in the privacy stakes, surpass both Siri and facial recognition of static photos. If actions speak louder than words, imagine the value to Google of digitising and knowing exactly what we do in real time.

  • Facial recognition as a Service and the sale of biometric templates may be tempting for the photo sharing sites. If and when biometric authentication spreads into retail payments and mobile device security, these systems will face the challenge of enrollment. It might be attractive to share face templates previously collected by Facebook and voice prints by Apple.



So, is it really too late for privacy? The infomopolists and national security zealots may hope so, but surely even cynics will see there is great deal at stake, and that it might be just a little too soon to rush to judge something as important as this.

Posted in Social Networking, Social Media, Privacy, Culture, Big Data

Identity is not a thing

We think we're talking about a thing when we refer to identity provisioning, or "Bring Your Own Identity", or the choice of identity that's axiomatic in NSTIC. The Laws of Identity encouraged us to think in terms of identity as a commodity, but at the same time the Laws cannily defined Digital Identity as a "set of claims".

So identity is not a thing.

Rather, identity is a state of affairs: Identity is How I Am Known.

[Update February 2013. I am embarrassed to admit I have only just discovered the work of Goffman and the dramaturgical analysis of identity. Goffman found that identity is an emergent property from social interaction, that it comes dynamically from the roles we play, and that it is formed by the way we believe others see us. That is, personal identity is partly impressed upon us. This is the sort of view I have arrived at with Digital Identity. Read on ...]

Digital identity is really just the conspicuous surface of a relationship we have with the Identity Provider (IdP). That relationship grows over time, starting from the evidence of identity (like the legislated "100 point" check in Australian banking) gathered at registration time, after which the IdP issues our identifier. But the identifier is really just a proxy for the relationship we have with a service provider, a relationship which can be deep and unfolding, and usually more complex than any identifier on its own would suggest. The original evidence of identity is just a boundary condition; it might be common across several relationships for a time, but it's really not what the ongoing relationship is all about.

So what can it mean to try and exercise a choice of identity? In business it's the Relying Party that bears most of the risk if an identity is wrong, and so it is that the Relying Party is very often the IdP, for then they can best manage their risk. And here the choice of business identity is moot. If you don't have an identity that meets the RP's needs, then they have the perogative to turn you away. Think about a store that doesn't accept Diners Club; do you have any prospect of negotiating with them to pay by Diners if that's your choice of card? Can it make any difference to the store owner that you might have extra credentials to present in real time?

However, in social dealings, identity is different. Here we do narrate our own life stories, we curate our own identities.

What's going on here? How do we reconcile these contradictions across our plurality of identities? It might help to describe two different orders of Digital Identity:


  • Expressed Identities that we control for ourselves and exercise in social circles, and

  • Impressed Identities that are bestowed upon us by employers, businesses and government. We have little or no control over how the Impressed identities are created, save for the ultimate power to simply decline a job, a bank account or a passport if we don't like the conditions that go with them.

And every now and then, Expressed and Impressed identities come into conflict, never more viscerally than in what I call the High School Reunion Effect. Most of us have probably experienced the psychic dislocation of meeting old school friends for the first time in decades at a reunion. You've changed; they've changed; our current lives and contexts are unknown and unknowable to our old peers. Instead the group context is frozen in time, and we all struggle to relate to one another according to old identities, while editing ourselves to reflect the new individuals that we have become in new contexts. But here's the thing: our old identities actually return, to varying degrees, impressed by how the group as a whole used to be. So identity is plastic.

High school reunions showcase the dynamic mixture of Impressed and Expressed identities. The way we choose to express ourselves is molded to a point to fit an inter-personal context impressed upon us by a community.

Another example - of greater practical importance - of the tension between impressed and expressed identity is the "Real Name" policies of Google and Facebook. Here we saw a mighty clash of the rights of people to define how they are known in distinct spheres, and the interests of network operators to "know" their users for commercial purposes. Perhaps that type of conflict would be better understood if we saw how different orders of identity have different degrees of freedom? Identity is literally relative.

And then there is the Bring Your Own Identity movement, another battle ground where competing intuitions about identity are playing out. Here the claimed right to use whatever identification method one likes butts up against the enterprise's need to set its own standards for authentication technology and identification risk management. Some BYOI advocates say this is not just about user convenience; businesses may save serious money through BYOI because it will save them from issuing their own IDs, just as BYOD is thought to reduce device support costs. But in most cases, the cost to the business of mapping and interfacing all the expressed identities that users might elect to bring simply exceeds the cost of the organisation impressing IDs for itself.

Digital Identity is a heady intersection of social, technological, business and political frames of reference. Our intuitions - not surprisingly really - can fail us in cyberspace. I reckon progress in NSTIC and similar initiatives will depend on us appreciating that identity online isn't always what it seems.

Posted in Social Networking, Privacy, Nymwars, Identity, Federated Identity

A penny for your marketable thoughts?

Most people think that Apple's Siri is the coolest thing they've ever seen on a smart phone. It certainly is a milestone in practical human-machine interfaces, and will be widely copied. The combination of deep search plus natural language processing (NLP) plus voice recognition is dynamite.

If you haven't had the pleasure ... Siri is a wondrous function built into the Apple iPhone. It’s the state-of-the-art in Artificial Intelligence and NLP. You speak directly to Siri, ask her questions (yes, she's female) and tell her what to do with many of your other apps. Siri integrates with mail, text messaging, maps, search, weather, calendar and so on. Ask her "Will I need an umbrella in the morning?" and she'll look up the weather for you – after checking your calendar to see what city you’ll be in tomorrow. It's amazing.

Natural Language Processing is a fabulous idea of course. It radically improves the usability of smart phones, and even their safety with much improved hands-free operation.

An important technical detail is that NLP is very demanding on computing power. In fact it's beyond the capability of today's smart phones, even if each of them alone is more powerful than all of NASA's computers in 1969!. So all Siri's hard work is actually done on Apple's mainframe computers scattered around the planet. That is, all your interactions with Siri are sent into the cloud.

Imagine Siri was a human personal assistant. Imagine she's looking after your diary, placing calls for you, booking meetings, planning your travel, taking dictation, sending emails and text messages for you, reminding you of your appointments, even your significant other’s birthday. She's getting to know you all the while, learning your habits, your preferences, your personal and work-a-day networks.

And she's free!

Now, wouldn't the offer of a free human PA strike you as too good to be true?

Indeed it would. So understand this about Siri: she's continuously reporting back to base. If Apple were a secretarial placement agency, what they get in return for the free services is a transcript of what you've said, the people you've been in touch with, where you've been, and what you plan to do. Apple won't say what they plan to do with all this data, how long they'll keep it, nor who they'll share it with. Apple's Privacy Policy (dated October 2011, accessed 12 March 2012) doesn't even mention Siri nor the collection of the voice-to-text data.

When you dictate your mails and text messages to Siri, you’re providing Apple with content that's usually off limits to carriers, phone companies and ISPs. Siri is an end run around telecommunicationss intercept laws.

Of course there are many, many examples of where free social media apps mask a commercial bargain. Face recognition is the classic case. It was first made available on photo sharing sites as a neat way to organise one’s albums, but then Facebook went further by inviting photo tags from users and then automatically identifying people in other photos on others' pages. What's happening behind the scenes is that Facebook is running its face recognition templates over the billions of photos in their databases (which were originally uploaded for personal use long before face recognition was deployed). Given their business model and their track record, we can be certain that Facebook is using face recognition to identify everyone they possibly can, and thence work out fresh associations between countless people and situations accidentally caught on camera. Combine this with image processing and visual search technology (like Google "Goggles") and the big social media companies have an incredible new eye in the sky. They can work out what we're doing, when, where and with whom. Nobody will need to like expressly "like" anything anymore when OSNs can literally see what cars we're driving, what brands we're wearing, where we spend our vacations, what we're eating, what makes us laugh, who makes us laugh. Apple, Facebook and others have understandably invested hundreds of millions of dollars in image recognition start-ups and intellectual property; with these tools they convert the hitherto anonymous images into content-addressable PII gold mines. It's the next frontier of Big Data.

Now, there wouldn't be much wrong with these sorts of arrangements if the social media corporations were up-front about them, and exercised some restraint. In their Privacy Policies they should detail what Personal Information they are extracting and collecting from all the voice and image data; they should explain why they collect this information, what they plan to do with it, how long they will retain it, and how they promise to limit secondary usage. They should explain that biometrics technology allows them to generate brand new PII out of members' snapshots and utterances. And they should acknowledge that by rendering data identifiable, they become accountable in many places under privacy and data protection laws for its safekeeping as PII. It's just not good enough to vaguely reserve their rights to "use personal information to help us develop, deliver, and improve our products, services, content, and advertising". They should treat their customers -- and all those innocents about whom they collect PII indirectly -- with proper respect, and stop blandly pretending that 'service improvement' is what they're up to.

Siri along with face recognition herald a radical new type of privatised surveillance, and on a breathtaking scale. While Facebook stealthily "x-ray" photo albums without consent, Apple now has even more intimate access to our daily routines and personal habits. And they don’t even pay as much as a penny for our thoughts.

As cool as Siri may be, I myself will decline to use any natural language processing while the software runs in the cloud, and while the service providers refuse to restrain their use of my voice data. I'll wait for NLP to be done on my device with my data kept private.

And I'd happily pay cold hard cash for that kind of app, instead of having an infomopoly embed itself in my personal affairs.

Posted in Social Networking, Social Media, Privacy, Language, Biometrics, Big Data