A Social Media Week Sydney event #SMWSydney
Law Lounge, Sydney University Law School
New Law School Building
Eastern Ave, Camperdown
Fri, Sep 26 - 10:00 AM - 11:30 AM
How can you navigate privacy fact and fiction, without the geeks and lawyers boring each other to death?
It's often said that technology has outpaced privacy law. Many digital businesses seem empowered by this brash belief. And so they proceed with apparent impunity to collect and monetise as much Personal Information as they can get their hands on.
But it's a myth!
Some of the biggest corporations in the world, including Google and Facebook, have been forcefully brought to book by privacy regulations. So, we have to ask ourselves:
- what does privacy law really mean for social media in Australia?
- is privacy "good for business"?
- is privacy "not a technology issue"?
- how can digital businesses navigate fact & fiction, without their geeks and lawyers boring each other to death?
In this Social Media Week Master Class I will:
- unpack what's "creepy" about certain online practices
- show how to rate data privacy issues objectively
- analyse classic misadventures with geolocation, facial recognition, and predicting when shoppers are pregnant
- critique photo tagging and crowd-sourced surveillance
- explain why Snapchat is worth more than three billion dollars
- analyse the regulatory implications of Big Data, Biometrics, Wearables and The Internet of Things.
We couldn't have timed this Master Class better, coming two weeks after the announcement of the Apple Watch, which will figure prominently in the class!
So please come along, for a fun and in-depth a look at social media, digital technology, the law, and decency.
About the presenter
Steve Wilson is a technologist, who stumbled into privacy 12 years ago. He rejected those well meaning slogans (like "Privacy Is Good For Business!") and instead dug into the relationships between information technology and information privacy. Now he researches and develops design patterns to help sort out privacy, alongside all the other competing requirements of security, cost, usability and revenue. His latest publications include:
- "Big Privacy: The new standard for Big Data Privacy" from Constellation Research, and
- "The collision between Big Data and privacy law" due out in October in the Australian Journal of Telecommunications and the Digital Economy.
Have you heard the news? "Privacy is dead!"
The message is urgent. It's often shouted in prominent headlines, with an implied challenge. The new masters of the digital universe urge the masses: C'mon, get with the program! Innovate! Don't be so precious! Don't you grok that Information Wants To Be Free? Old fashioned privacy is holding us back!
The stark choice posited between privacy and digital liberation is rarely examined with much intellectual rigor. Often, "privacy is dead" is just a tired fatalistic response to the latest breach or eye-popping digital development, like facial recognition, or a smartphone's location monitoring. In fact, those who earnestly assert that privacy is over are almost always trying to sell us something, be it sneakers, or a political ideology, or a wanton digital business model.
Is it really too late for privacy? Is the "genie out of the bottle"? Even if we accepted the ridiculous premise that privacy is at odds with progress, no it's not too late, for a couple of reasons. Firstly, the pessimism (or barely disguised commercial opportunism) generally confuses secrecy for privacy. And secondly, frankly, we aint seen nothin yet!
Technology certainly has laid us bare. Behavioral modeling, facial recognition, Big Data mining, natural language processing and so on have given corporations X-Ray vision into our digital lives. While exhibitionism has been cultivated and normalised by the informopolists, even the most guarded social network users may be defiled by data prospectors who, without consent, upload their contact lists, pore over their photo albums, and mine their shopping histories.
So yes, a great deal about us has leaked out into what some see as an infinitely extended neo-public domain. And yet we can be public and retain our privacy at the same time. Just as we have for centuries of civilised life.
It's true that privacy is a slippery concept. The leading privacy scholar Daniel Solove once observed that "Privacy is a concept in disarray. Nobody can articulate what it means."
Some people seem defeated by privacy's definitional difficulties, yet information privacy is simply framed, and corresponding data protection laws are elegant and readily understood.
Information privacy is basically a state where those who know us are restrained in they do with the knowledge they have about us. Privacy is about respect, and protecting individuals against exploitation. It is not about secrecy or even anonymity. There are few cases where ordinary people really want to be anonymous. We actually want businesses to know - within limits - who we are, where we are, what we've done and what we like ... but we want them to respect what they know, to not share it with others, and to not take advantage of it in unexpected ways. Privacy means that organisations behave as though it's a privilege to know us. Privacy can involve businesses and governments giving up a little bit of power.
Many have come to see privacy as literally a battleground. The grassroots Cryptoparty movement came together around the heady belief that privacy means hiding from the establishment. Cryptoparties teach participants how to use Tor and PGP, and they spread a message of resistance. They take inspiration from the Arab Spring where encryption has of course been vital for the security of protestors and organisers. One Cryptoparty I attended in Sydney opened with tributes from Anonymous, and a number of recorded talks by activists who ranged across a spectrum of political issues like censorship, copyright, national security and Occupy.
I appreciate where they're coming from, for the establishment has always overplayed its security hand, and run roughshod over privacy. Even traditionally moderate Western countries have governments charging like china shop bulls into web filtering and ISP data retention, all in the name of a poorly characterised terrorist threat. When governments show little sympathy for netizenship, and absolutely no understanding of how the web works, it's unsurprising that sections of society take up digital arms in response.
Yet going underground with encryption is a limited privacy stratagem, because do-it-yourself encryption is incompatible with the majority of our digital dealings. The most nefarious and least controlled privacy offences are committed not by government but by Internet companies, large and small. To engage fairly and squarely with businesses, consumers need privacy protections, comparable to the safeguards against unscrupulous merchants we enjoy, uncontroversially, in traditional commerce. There should be reasonable limitations on how our Personally Identifiable Information (PII) is used by all the services we deal with. We need department stores to refrain from extracting health information from our shopping habits, merchants to not use our credit card numbers as customer reference numbers, shopping malls to not track patrons by their mobile phones, and online social networks to not x-ray our photo albums by biometric face recognition.
Encrypting everything we do would only put it beyond reach of the companies we obviously want to deal with. Look for instance at how the cryptoparties are organised. Some cryptoparties manage their bookings via the US event organiser Eventbrite to which attendants have to send a few personal details. So ironically, when registering for a cryptoparty, you can not use encryption!
The central issue is this: going out in public does not neutralise privacy. It never did in the physical world and it shouldn't be the case in cyberspace either. Modern society has long rested on balanced consumer protection regulations to curb the occasional excesses of business and government. Therefore we ought not to respond to online privacy invasions as if the digital economy is a new Wild West. We should not have to hide away if privacy is agreed to mean respecting the PII of customers, users and citizens, and restraining what data custodians do with that precious resource.
We're still in the early days of the social web, and the information innovation has really only just begun. There is incredible value to be extracted from mining the underground rivers of data coursing unseen through cyberspace, and refining that raw material into Personal Information.
Look at what the data prospectors and processors have managed to do already.
It's difficult to overstate the value of facial recognition to businesses like Facebook when they have just one asset: knowledge about their members and users. Combined with image analysis and content addressable graphical memory, facial recognition lets social media companies work out what we're doing, when, where and with whom. I call it piracy. Billions of everyday images have been uploaded over many years by users for ostensiby personal purposes, without any clue that technology would energe to convert those pictures into a commercial resource.
Third party services like Facedeals are starting to emerge, using Facebook's photo resources for commercial facial recognition in public. And the most recent facial recognition entrepreneurs like Name Tag App boast of scraping images from any "public" photo databases they can find. But as we shall see below, in many parts of the world there are restrictions on leveraging public-facing databases, because there is a legal difference between anonymous data and identified information.
- Some of the richest stores of raw customer data are aggregated in retailer databases. The UK department store Tesco for example is
said to hold more data about British citizens than the government does. For years of course data analysts have combed through shopping history for marketing insights, but their predictive powers are growing rapidly. An infamous example is Target's covert development of methods to identify customers who are pregnant based on their buying habits. Some Big Data practitioners seem so enamoured with their ability to extract secrets from apparently mundane data, they overlook that PII collected indirectly by algorithm is subject to privacy law just as if it was collected directly by questionnaire. Retailers need to remember this as they prepare to exploit their massive loyalty databases into new financial services ventures.
- And looking ahead, Google Glass in the privacy stakes will probably surpass both Siri and facial recognition. If actions speak louder than words, imagine the value to Google of seeing through Glass exactly what we do in real time. Digital companies wanting to know our minds won't need us to expressly "like" anything anymore; they'll be able to tell our preferences from our unexpurgated behaviours.
The surprising power of data protection regulations
There's a widespread belief that technology has outstripped privacy law, yet it turns out technology neutral data privacy law copes well with most digital developments. OECD privacy principles (enacted in over 100 countries) and the US FIPPs (Fair Information Practice Principles) require that companies be transarent about what PII they collect and why, limit the ways in which PII is used for unrelated purposes.
Privacy advocates can take heart from several cases where existing privacy regulations have proven effective against some of the informopolies' trespasses. And technologists and cynics who think privacy is hopeless should heed the lessons.
- Google StreetView cars, while they drive up and down photographing the world, also collect Wi-Fi hub coordinates for use in geo-location services. In 2010 it was discovered that the StreetView software was also collecting unencrypted Wi-Fi network traffic, some of which contained Personal Information like user names and even passwords. Privacy Commissioners in Australia, Japan, Korea, the Netherlands and elsewhere found Google was in breach of their data protection laws. Google explained that the collection was inadverrtant, apologized, and destroyed all the wireless traffic that had been gathered.
The nature of this privacy offence has confused some commentators and technologists. Some argue that Wi-Fi data in the public domain is not private, and "by definition" (so they like to say) categorically could not be private. Accordingly some believed Google was within its rights to do whatever it liked with such found data. But that reasoning fails to grasp the technicality that Data Protection laws in Europe, Australia and elsewhere do not essentially distinguish “public” from "private". In fact the word “private” doesn’t even appear in Australia’s “Privacy Act”. If data is identifiable, then privacy rights generally attach to it irrespective of how it is collected.
- Facebook photo tagging was ruled unlawful by European privacy regulators in mid 2012, on the grounds it represents a collection of PII (by the operation of the biometric matching algorithm) without consent. By late 2012 Facebook was forced to shut down facial recognition and tag suggestions in the EU. This was quite a show of force over one of the most powerful companies of the digital age. More recently Facebook has started to re-introduce photo tagging, prompting the German privacy regulator to reaffirm that this use of biometrics is counter to their privacy laws.
It's never too late
So, is it really too late for privacy? Outside the United States at least, established privacy doctrine and consumer protections have taken technocrats by surprise. They have found, perhaps counter intuitively, that they are not as free as they thought to exploit all personal data that comes their way.
Privacy is not threatened so much by technology as it is by sloppy thinking and, I'm afraid, by wishful thinking on the part of some vested interests. Privacy and anonymity, on close reflection, are not the same thing, and we shouldn't want them to be! It's clearly important to be known by others in a civilised society, and it's equally important that those who do know us, are reasonably restrained in how they use that knowledge.
It's long been said that if you're getting something for free online, then you're not the customer, you're the product. It's a reference to the one-sided bargain for personal information that powers so many social businesses - the way that "infomopolies" as I call them exploit the knowledge they accumulate about us.
Now it's been revealed that we're even lower than product: we're lab rats.
Facebook data scientist Adam Kramer, with collaborators from UCSF and Cornell, this week reported on a study in which they tested how Facebook users respond psychologically to alternatively positive and negative posts. Their experimental technique is at once ingenious and shocking. They took the real life posts of nearly 700,000 Facebook members, and manipulated them, turning them slightly up- or down-beat. And then Kramer at al measured the emotional tone in how people reading those posts reacted in their own feeds. See Experimental evidence of massive-scale emotional contagion through social networks, Adam Kramer,Jamie Guillory & Jeffrey Hancock, in Proceedings of the National Academy of Sciences, v111.24, 17 June 2014.
The resulting scandal has been well-reported by many, including Kashmir Hill in Forbes, whose blog post nicely covers how the affair has unfolded, and includes a response by Adam Kramer himself.
Plenty has been written already about the dodgy (or non-existent) ethics approval, and the entirely contemptible claim that users gave "informed consent" to have their data "used" for research in this way. I want to draw attention here to Adam Kramer's unvarnished description of their motives. His response to the furore (provided by Hill in her blog) is, as she puts it, tone deaf. Kramer makes no attempt whatsover at a serious scientific justification for this experiment:
- "The reason we did this research is because we care about the emotional impact of Facebook and the people that use our product ... [We] were concerned that exposure to friends’ negativity might lead people to avoid visiting Facebook.
That is, this large scale psychological experiment was simply for product development.
Some apologists have, I hear, countered that social network feeds are manipulated all the time, notably by advertisers, to produce emotional responses.
Now that's interesting, because for their A-B experiment, Kramer and his colleagues took great pains to make sure the subjects were unaware of the manipulation. After all, the results would be meaningless if people knew what they were reading had been emotionally fiddled with.
In contrast, the ad industry has always insisted that today's digital consumers are super savvy, and they know the difference between advertising and real-life. Advertising is therefore styled as just a bit of harmless fun. But this line is I think further exposed by the Facebook Experiment as self-serving mythology, crafted by people who are increasingly expert at covertly manipulating perceptions, and who now have the data, collected dishonestly, to prove it.
We live in an age where billionaires are self-made on the back of the most intangible of assets – the information they have amassed about us. That information used to be volunteered in forms and questionnaires and contracts but increasingly personal information is being observed and inferred.
The modern world is awash with data. It’s a new and infinitely re-usable raw material. Most of the raw data about us is an invisible by-product of our mundane digital lives, left behind by the gigabyte by ordinary people who do not perceive it let alone understand it.
Many Big Data and digital businesses proceed on the basis that all this raw data is up for grabs. There is a particular widespread assumption that data in the "public domain" is free-for-all, and if you’re clever enough to grab it, then you’re entitled to extract whatever you can from it.
In the webinar, I'll try to show how some of these assumptions are naive. The public is increasingly alarmed about Big Data and averse to unbridled data mining. Excessive data mining isn't just subjectively 'creepy'; it can be objectively unlawful in many parts of the world. Conventional data protection laws turn out to be surprisingly powerful in in the face of Big Data. Data miners ignore international privacy laws at their peril!
Today there are all sorts of initiatives trying to forge a new technology-privacy synthesis. They go by names like "Privacy Engineering" and "Privacy by Design". These are well meaning efforts but they can be a bit stilted. They typically overlook the strengths of conventional privacy law, and they can miss an opportunity to engage the engineering mind.
It’s not politically correct but I believe we must admit that privacy is full of contradictions and competing interests. We need to be more mature about privacy. Just as there is no such thing as perfect security, there can never be perfect privacy either. And is where the professional engineering mindset should be brought in, to help deal with conflicting requirements.
If we’re serious about Privacy by Design and Privacy Engineering then we need to acknowledge the tensions. That’s some of the thinking behind Constellation's new Big Privacy compact. To balance privacy and Big Data, we need to hold a conversation with users that respects the stresses and strains, and involves them in working through the new privacy deal.
The webinar will cover these highlights of the Big Privacy pact:
- Respect and Restraint
- Super transparency
- And a fair deal for Personal Information.
Have a disruptive technology implementation story? Get recognised for your leadership. Apply for the 2014 SuperNova Awards for leaders in disruptive technology.
The latest Snowden revelations include the NSA's special programs for extracting photos and identifying from the Internet. Amongst other things the NSA uses their vast information resources to correlate location cues in photos -- buildings, streets and so on -- with satellite data, to work out where people are. They even search especially for passport photos, because these are better fodder for facial recognition algorithms. The audacity of these government surveillance activities continues to surprise us, and their secrecy is abhorrent.
Yet an ever greater scale of private sector surveillance has been going on for years in social media. With great pride, Facebook recently revealed its R&D in facial recognition. They showcased the brazenly named "DeepFace" biometric algorithm, which is claimed to be 97% accurate in recognising faces from regular images. Facebook has made a swaggering big investment in biometrics.
Data mining needs raw material, there's lots of it out there, and Facebook has been supremely clever at attracting it. It's been suggested that 20% of all photos now taken end up in Facebook. Even three years ago, Facebook held 10,000 times as many photographs as the Library of Congress:
And Facebook will spend big buying other photo lodes. Last year they tried to buy Snapchat for the spectacular sum of three billion dollars. The figure had pundits reeling. How could a start-up company with 30 people be worth so much? All the usual dot com comparisons were made; the offer seemed a flight of fancy.
But no, the offer was a rational consideration for the precious raw material that lies buried in photo data.
Snapchat generates at least 100 million new images every day. Three billion dollars was, pardon me, a snap. I figure that at a ballpark internal rate of return of 10%, a $3B investment is equivalent to $300M p.a. so even if the Snapchat volume stopped growing, Facebook would have been paying one cent for every new snap, in perpetuity.
These days, we have learned from Snowden and the NSA that communications metadata is just as valuable as the content of our emails and phone calls. So remember that it's the same with photos. Each digital photo comes from a device that embeds within the image metadata usually including the time and place of when the picture was taken. And of course each Instagram or Snapchat is a social post, sent by an account holder with a history and rich context in which the image yields intimate real time information about what they're doing, when and where.
- When you access or use our Services, we automatically collect information about you, including:
- Usage Information: When you send or receive messages via our Services, we collect information about these messages, including the time, date, sender and recipient of the Snap. We also collect information about the number of messages sent and received between you and your friends and which friends you exchange messages with most frequently.
- Log Information: We log information about your use of our websites, including your browser type and language, access times, pages viewed, your IP address and the website you visited before navigating to our websites.
- Device Information: We may collect information about the computer or device you use to access our Services, including the hardware model, operating system and version, MAC address, unique device identifier, phone number, International Mobile Equipment Identity ("IMEI") and mobile network information. In addition, the Services may access your device's native phone book and image storage applications, with your consent, to facilitate your use of certain features of the Services.
Snapchat goes on to declare it may use any of this information to "personalize and improve the Services and provide advertisements, content or features that match user profiles or interests" and it reserves the right to share any information with "vendors, consultants and other service providers who need access to such information to carry out work on our behalf".
So back to the data mining: nothing stops Snapchat -- or a new parent company -- running biometric facial recognition over the snaps as they pass through the servers, to extract additional "profile" information. And there's an extra kicker that makes Snapchats extra valuable for biometric data miners. The vast majority of Snapchats are selfies. So if you extract a biometric template from a snap, you already know who it belongs to, without anyone having to tag it. Snapchat would provide a hundred million auto-calibrations every day for facial recognition algorithms! On Facebook, the privacy aware turn off photo tagging, but with Snapchats, self identification is inherent to the experience and is unlikely to be ever be disabled.
As I've discussed before, the morbid thrill of Snowden's spying revelations has tended to overshadow his sober observations that when surveillance by the state is probably inevitable, we need to be discussing accountability.
While we're all ventilating about the NSA, it's time we also attended to private sector spying and properly debated the restraints that may be appropriate on corporate exploitation of social data.
Personally I'm much more worried that an infomopoly has all my selfies.
Have a disruptive technology implementation story? Get recognised for your leadership. Apply for the 2014 SuperNova Awards for leaders in disruptive technology.
My Constellation Research colleague Alan Lepofsky as been working on new ways to characterise users in cyberspace. Frustrated with the oversimplified cliche of the "Digital Millennials", Alan has developed a fresh framework for categorizing users according to their comfort with technology and their actual knowledge of it. See his new research report "Segmenting Audiences by Digital Proficiency".
This sort of schema could help frame the answers to some vital open questions. In today's maelstrom of idealism and hyperbole, we're struggling to predict how things are going to turn out, and to build appropriate policies and management structures. We are still guessing how the digital revolution is really going to change the human condition? We're not yet rigorously measuring the sorts of true changes, if any, that the digital transformation is causing.
We hold such disparate views about cyberspace right now. When the Internet does good – for example through empowering marginalized kids at schools, fueling new entrepreneurship, or connecting disadvantaged communities – it is described as a power for good, a true "paradigm shift". But when it does bad – as when kids are bullied online or when phishing scams hook inexperienced users – then the Internet is said to be just another communications medium. Such inconsistent attitudes are with us principally because the medium is still so new. Yet we all know how important it is, and that far reaching policy decisions are being made today. So it’s good to see new conceptual frameworks for analyzing the range of ways that people engage with and utilise the Internet.
Vast fortunes are being made through online business models that purport to feed a natural hunger to be social. With its vast reach and zero friction, the digital medium might radically amplify aspects of the social drive, quite possibly beyond what nature intended. As supremely communal beings, we humans have evolved elaborate social bearings for getting on in diverse groups, and we've built social conventions that govern how we meet, collaborate, refer, partner, buy and sell, amalgamate, marry, and split. We are incredibly adept at reading body language, spotting untruths, and gaming each other for protection or for personal advantage. In cyberspace, few of the traditional cues are available to us; we literally lose our bearings online. And therefore naive Internet users fall prey to spam, fake websites and all manner of scams.
How are online users adapting to their new environment and evolving new instincts? I expect there will be interesting correlations between digital resilience and the sophistication measures in Alan’s digital proficiency framework. We might expect Digital Natives to be better equipped inherently to detect and respond to online threats, although they might be somewhat more at risk by virtue of being more active. I wonder too if the risk-taking behavior which exacerbates some online risks for adolescents would be relatively more common amongst Digital Immigrants? By the same token, the Digital Skeptics who are knowledgeable yet uncomfortable may be happy staying put in that quadrant, or only venturing out for selected cyber activities, because they’re consciously managing their digital exposure.
We certainly do need new ways like Alan's Digital Proficiency Framework to understand society’s complex "Analog to Digital" conversion. I commend it to you.
I've just completed a major new Constellation Research report looking at how today's privacy practices cope with Big Data. The report draws together my longstanding research on the counter-intuitive strengths of technology-neutral data protection laws, and melds it with my new Constellation colleagues' vast body of work in data analytics. The synergy is honestly exciting and illuminating.
Big Data promises tremendous benefits for a great many stakeholders but the potential gains are jeopardised by the excesses of a few. Some cavalier online businesses are propelled by a naive assumption that data in the "public domain" is up for grabs, and with that they often cross a line.
For example, there are apps and services now that will try to identify pictures you take of strangers in public, by matching them biometrically against data supersets compiled from social networking sites and other publically accessible databases. Many find such offerings quite creepy but they may be at a loss as to what to do about it, or even how to think through the issues objectively. Yet the very metaphor of data mining holds some of the clues. If, as some say, raw data is like crude oil, just waiting to be mined and exploited by enterprising prospecters, then surely there are limits, akin to mining permits?
Many think the law has not kept pace with technology, and that digital innovators are free to do what they like with any data they can get their hands on. But technologists repreatedly underestimate the strength of conventional data protection laws and regulations. The extraction of PII from raw data may be interpreted under technology neutral privacy principles as an act of Collection and as such is subject to existing statutes. Around the world, Google thus found they are not actually allowed to gather Personal Data that happens to be available in unencrypted Wi-Fi transmission as StreetView cars drive by homes and offices. And Facebook found they are not actually allowed to automatically identify people in photos through face recognition without consent. And Target probably would find, if they tried it outside the USA, that they cannot flag selected female customers as possibly pregnant by analysing their buying habits.
On the other hand, orthodox privacy policies and static user agreements do not cater for the way personal data can be conjured tomorrow from raw data collected today. Traditional privacy regimes require businesses to spell out what personally identifiable information (PII) they collect and why, and to restrict secondary usage. Yet with Big Data, with the best will in the world, a company might not know what data analytics will yield down the track. If mutual benefits for business and customer alike might be uncovered, a freeze-frame privacy arrangement may be counter-productive.
Thus the fit between data analytics and data privacy standards is complex and sometimes surprising. While existing laws are not to be underestimated, we do need something new. As far as I know it was Ray Wang in his Harvard Business Review blog who first called for a fresh privacy compact amongst users and businesses.
The spirit of data privacy is simply framed: organisations that know us should respect the knowledge they have, they should be open about what they know, and they should be restrained in what they do with it. In the Age of Big Data, let's have businesses respect the intelligence they extract from data mining, just as they should respect the knowledge they collect directly through forms and questionnaires.
I like the label "Big Privacy"; it is grandly optimistic, like "Big Data" itself, and at the same time implies a challenge to do better than regular privacy practices.
Ontario Privacy Commissioner Dr Ann Cavoukian writes about Big Privacy, describing it simply as "Privacy By Design writ large". But I think there must be more to it than that. Big Data is quantitatively but also qualitatively different from ordinary data analyis.
To summarise the basic elements of a Big Data compact:
- Respect and Restraint: In the face of Big Data’s temptations, remember that privacy is not only about what we do with PII; just as important is what we choose not to do.
- Super transparency: Who knows what lies ahead in Big Data? If data privacy means being open about what PII is collected and why, then advanced privacy means going further, telling people more about the business models and the sorts of results data mining is expected to return.
- Engage customers in a fair deal for PII: Information businesses ought to set out what PII is really worth to them (especially when it is extracted in non-obvious ways from raw data) and offer a fair "price" for it, whether in the form of "free" products and services, or explicit payment.
- Really innovate in privacy: There’s a common refrain that “privacy hampers innovation” but often that's an intellectually lazy cover for reserving the right to strip-mine PII. Real innovation lies in business practices which create and leverage PII while honoring privacy principles.
My report, "Big Privacy" Rises to the Challenges of Big Data may be downloaded from the Constellation Research website.
Yesterday it was reported by The Verge that anonymous hackers have accessed Snapchat's user database and posted 4.6 million user names and phone numbers. In an apparent effort to soften the blow, two digits of the phone numbers were redacted. So we might assume this is a "white hat" exercise, designed to shame Snapchat into improving their security. Indeed, a few days ago Snapchat themselves said they had been warned of vulnerabilities in their APIs that would allow a mass upload of user records.
The response of many has been, well, so what? Some people have casually likened Snapchat's list to a public White Pages; others have played it down as "just email addresses".
Let's look more closely. The leaked list was not in fact public names and phone numbers; it was user names and phone numbers. User names might often be email addresses but these are typically aliases; people frequently choose email addresses that reveal little or nothing of their real world identity. We should assume there is intent in an obscure email address for the individual to remain secret.
Identity theft has become a highly organised criminal enterprise. Crime gangs patiently acquire multiple data sets over many months, sometimes years, gradually piecing together detailed personal profiles. It's been shown time and time again by privacy researchers (perhaps most notably Latanya Sweeney) that re-identification is enabled by linking diverse data sets. And for this purpose, email addresses and phone numbers are superbly valuable indices for correlating an individual's various records. Your email address is common across most of your social media registrations. And your phone number allows your real name and street address to be looked up from reverse White Pages. So the Snapchat breach could be used to join aliases or email addresses to real names and addresses via the phone numbers. For a social engineering attack on a call centre -- or even to open a new bank account -- an identity thief can go an awful long way with real name, street address, email address and phone number.
I was asked in an interview to compare the theft of stolen phone numbers with social security numbers. I surprised the interviewer when I said phone numbers are probably even more valuable to the highly organised ID thief, for they can be used to index names in public directories, and to link different data sets, in ways that SSNs (or credit card numbers for that matter) cannot.
So let us start to treat all personal inormation -- especially when aggregated in bulk -- more seriously! And let's be more cautious in the way we categorise personal or Personally Identifiable Information (PII).
Importantly, most regulatory definitions of PII already embody the proper degree of caution. Look carefully at the US government definition of Personally Identifiable Information:
- information that can be used to distinguish or trace an individual’s identity, either alone or when combined with other personal or identifying information that is linked or linkable to a specific individual (underline added).
This means that items of data can constitute PII if other data can be combined to identify the person concerned. That is, the fragments are regarded as PII even if it is the whole that does the identifying.
And remember that the middle I in PII stands for Identifiable, and not, as many people presume, Identifying. To meet the definition of PII, data need not uniquely identify a person, it merely needs to be directly or indirectly identifiable with a person. And this is how it should be when we heed the way information technologies enable identification through linkages.
Almost anywhere else in the world, data stores like Snapchat's would automatically fall under data protection and information privacy laws; regulators would take a close look at whether the company had complied with the OECD Privacy Principles, and whether Snapchat's security measures were fit for purpose given the PII concerned. But in the USA, companies and commentators alike still have trouble working out how serious these breaches are. Each new breach is treated in an ad hoc manner, often with people finessing the difference between credit card numbers -- as in the recent Target breach -- and "mere" email addresses like those in the Snapchat and Epsilon episodes.
Surely the time has come to simply give proper regulatory protection to all PII.
Facebook's challenge to the Collection Limitation Principle
An extract from our chapter in the forthcoming Encyclopedia of Social Network Analysis and Mining (to be published by Springer in 2014).
Stephen Wilson, Lockstep Consulting, Sydney, Australia.
Anna Johnston, Salinger Privacy, Sydney, Australia.
- Facebook's business practices pose a risk of non-compliance with the Collection Limitation Principle (OECD Privacy Principle No. 1, and corresponding Australian National Privacy Principles NPP 1.1 through 1.4).
- Privacy problems will likely remain while Facebook's business model remains unsettled, for the business is largely based on collecting and creating as much Personal Information as it can, for subsequent and as yet unspecified monetization.
- If an OSN business doesn't know how it is eventually going to make money from Personal Information, then it has a fundamental difficulty with the Collection Limitation principle.
Facebook is an Internet and societal phenomenon. Launched in 2004, in just a few years it has claimed a significant proportion of the world's population as regular users, becoming by far the most dominant Online Social Network (OSN). With its success has come a good deal of controversy, especially over privacy. Does Facebook herald a true shift in privacy values? Or, despite occasional reckless revelations, are most users no more promiscuous than they were eight years ago? We argue it's too early to draw conclusions about society as a whole from the OSN experience to date. In fact, under laws that currently stand, many OSNs face a number of compliance risks in dozens of jurisdictions.
Over 80 countries worldwide now have enacted data privacy laws, around half of which are based on privacy principles articulated by the OECD. Amongst these are the Collection Limitation Principle which requires businesses to not gather more Personal Information than they need for the tasks at hand, and the Use Limitation Principle which dictates that Personal Information collected for one purpose not be arbitrarily used for others without consent.
Overt collection, covert collection (including generation) and "innovative" secondary use of Personal Information are the lifeblood of Facebook. While Facebook's founder would have us believe that social mores have changed, a clash with orthodox data privacy laws creates challenges for the OSN business model in general.
This article examines a number of areas of privacy compliance risk for Facebook. We focus on how Facebook collects Personal Information indirectly, through the import of members' email address books for "finding friends", and by photo tagging. Taking Australia's National Privacy Principles from the Privacy Act 1988 (Cth) as our guide, we identify a number of potential breaches of privacy law, and issues that may be generalised across all OECD-based privacy environments.
Australian law tends to use the term "Personal Information" rather than "Personally Identifiable Information" although they are essentially synonymous for our purposes.
Terms of reference: OECD Privacy Principles and Australian law
The Organisation for Economic Cooperation and Development has articulated eight privacy principles for helping to protect personal information. The OECD Privacy Principles are as follows:
- 1. Collection Limitation Principle
- 2. Data Quality Principle
- 3. Purpose Specification Principle
- 4. Use Limitation Principle
- 5. Security Safeguards Principle
- 6. Openness Principle
- 7. Individual Participation Principle
- 8. Accountability Principle
Of most interest to us here are principles one and four:
- Collection Limitation Principle: There should be limits to the collection of personal data and any such data should be obtained by lawful and fair means and, where appropriate, with the knowledge or consent of the data subject.
- Use Limitation Principle: Personal data should not be disclosed, made available or otherwise used for purposes other than those specified in accordance with [the Purpose Specification] except with the consent of the data subject, or by the authority of law.
At least 89 counties have some sort of data protection legislation in place [Greenleaf, 2012]. Of these, in excess of 30 jurisdictions have derived their particular privacy regulations from the OECD principles. One example is Australia.
We will use Australia's National Privacy Principles NPPs in the Privacy Act 1988 as our terms of reference for analysing some of Facebook's systemic privacy issues. In Australia, Personal Information is defined as: information or an opinion (including information or an opinion forming part of a database), whether true or not, and whether recorded in a material form or not, about an individual whose identity is apparent, or can reasonably be ascertained, from the information or opinion.
Indirect collection of contacts
One of the most significant collections of Personal Information by Facebook is surely the email address book of those members that elect to have the site help "find friends". This facility provides Facebook with a copy of all contacts from the address book of the member's nominated email account. It's the very first thing that a new user is invited to do when they register. Facebook refer to this as "contact import" in the Data Use Policy (accessed 10 August 2012).
"Find friends" is curtly described as "Search your email for friends already on Facebook". A link labelled "Learn more" in fine print leads to the following additional explanation:
- "Facebook won't share the email addresses you import with anyone, but we will store them on your behalf and may use them later to help others search for people or to generate friend suggestions for you and others. Depending on your email provider, addresses from your contacts list and mail folders may be imported. You should only import contacts from accounts you've set up for personal use." [underline added by us].
Without any further elaboration, new users are invited to enter their email address and password if they have a cloud based email account (such as Hotmail, gmail, Yahoo and the like). These types of services have an API through which any third party application can programmatically access the account, after presenting the user name and password.
It is entirely possible that casual users will not fully comprehend what is happening when they opt in to have Facebook "find friends". Further, there is no indication that, by default, imported contact details are shared with everyone. The underlined text in the passage quoted above shows Facebook reserves the right to use imported contacts to make direct approaches to people who might not even be members.
Importing contacts represents an indirect collection by Facebook of Personal Information of others, without their authorisation or even knowledge. The short explanatory information quoted above is not provided to the individuals whose details are imported and therefore does not constitute a Collection Notice. Furthermore, it leaves the door open for Facebook to use imported contacts for other, unspecified purposes. The Data Use Policy imposes no limitations as to how Facebook may make use of imported contacts.
Privacy harms are possible in social networking if members blur the distinction between work and private lives. Recent research has pointed to the risky use of Facebook by young doctors, involving inappropriate discussion of patients [Moubarak et al, 2010]. Even if doctors are discreet in their online chat, we are concerned that they may run foul of the Find Friends feature exposing their connections to named patients. Doctors on Facebook who happen to have patients in their web mail address books can have associations between individuals and their doctors become public. In mental health, sexual health, family planning, substance abuse and similar sensitive fields, naming patients could be catastrophic for them.
While most healthcare professionals may use a specific workplace email account which would not be amenable to contacts import, many allied health professionals, counselors, specialists and the like run their sole practices as small businesses, and naturally some will use low cost or free cloud-based email services. Note that the substance of a doctor's communications with their patients over web mail is not at issue here. The problem of exposing associations between patients and doctors arises simply from the presence of a name in an address book, even if the email was only ever used for non-clinical purposes such as appointments or marketing.
Photo tagging and biometric facial recognition
One of Facebook's most "innovative" forms of Personal Information Collection would have to be photo tagging and the creation of biometric facial recognition templates.
Photo tagging and "face matching" has been available in social media for some years now. On photo sharing sites such as Picasa, this technology "lets you organize your photos according to the people in them" in the words of the Picasa help pages. But in more complicated OSN settings, biometrics has enormous potential to both enhance the services on offer and to breach privacy.
In thinking about facial recognition, we start once more with the Collection Principle. Importantly, nothing in the Australian Privacy Act circumscribes the manner of collection; no matter how a data custodian comes to be in possession of Personal Information (being essentially any data about a person whose identity is apparent) they may be deemed to have collected it. When one Facebook member tags another in a photo on the site, then the result is that Facebook has overtly but indirectly collected PI about the tagged person.
Facial recognition technologies are deployed within Facebook to allow its servers to automatically make tag suggestions; in our view this process constitutes a new type of Personal Information Collection, on a potentially vast scale.
Biometric facial recognition works by processing image data to extract certain distinguishing features (like the separation of the eyes, nose, ears and so on) and computing a numerical data set known as a template that is highly specific to the face, though not necessarily unique. Facebook's online help indicates that they create templates from multiple tagged photos; if a user removes a tag from one of their photo, that image is not used in the template.
Facebook subsequently makes tag suggestions when a member views photos of their friends. They explain the process thus:
- "We are able to suggest that your friend tag you in a picture by scanning and comparing your friend‘s pictures to information we've put together from the other photos you've been tagged in".
So we see that Facebook must be more or less continuously checking images from members' photo albums against its store of facial recognition templates. When a match is detected, a tag suggestion is generated and logged, ready to be displayed next time the member is online.
What concerns us is that the proactive creation of biometric matches constitutes a new type of PI Collection, for Facebook must be attaching names -- even tentatively, as metadata -- to photos. This is a covert and indirect process.
Photos of anonymous strangers are not Personal Information, but metadata that identifies people in those photos most certainly is. Thus facial recognition is converting hitherto anonymous data -- uploaded in the past for personal reasons unrelated to photo tagging let alone covert identification -- into Personal Information.
Facebook limits the ability to tag photos to members who are friends of the target. This is purportedly a privacy enhancing feature, but unfortunately Facebook has nothing in its Data Use Policy to limit the use of the biometric data compiled through tagging. Restricting tagging to friends is likely to actually benefit Facebook for it reduces the number of specious or mischievous tags, and it probably enhances accuracy by having faces identified only by those who know the individuals.
A fundamental clash with the Collection Limitation Principle
In Australian privacy law, as with the OECD framework, the first and foremost privacy principle concerns Collection. Australia's National Privacy Principle NPP 1 requires that an organisation refrain from collecting Personal Information unless (a) there is a clear need to collect that information; (b) the collection is done by fair means, and (c) the individual concerned is made aware of the collection and the reasons for it.
The core business model of many Online Social Networks is to take advantage of Personal Information, in many and varied ways. From the outset, Facebook founder, Mark Zuckerberg, appears to have been enthusiastic for information built up in his system to be used by others. In 2004, he told a colleague "if you ever need info about anyone at Harvard, just ask" (as reported by Business Insider). Since then, Facebook has experienced a string of privacy controversies, including the "Beacon" sharing feature in 2007, which automatically imported members' activities on external websites and re-posted the information on Facebook for others to see.
Facebook's privacy missteps are characterised by the company using the data it collects in unforeseen and barely disclosed ways. Yet this is surely what Facebook's investors expect the company to be doing: innovating in the commercial exploitation of personal information. The company's huge market valuation derives from a widespread faith in the business community that Facebook will eventually generate huge revenues. An inherent clash with privacy arises from the fact that Facebook is a pure play information company: its only significant asset is the information it holds about its members. There is a market expectation that this asset will be monetized and maximised. Logically, anything that checks the network's flux in Personal Information -- such as the restraints inherent in privacy protection, whether adopted from within or imposed from without -- must affect the company's futures.
Perhaps the toughest privacy dilemma for innovation in commercial Online Social Networking is that these businesses still don't know how they are going to make money from their Personal Information lode. Even if they wanted to, they cannot tell what use they will eventually make of it, and so a fundamental clash with the Collection Limitation Principle remains.
An earlier version of this article was originally published by LexisNexis in the Privacy Law Bulletin (2010).
- Greenleaf G., "Global Data Privacy Laws: 89 Countries, and Accelerating", Privacy Laws & Business International Report, Issue 115, Special Supplement, February 2012 Queen Mary School of Law Legal Studies Research Paper No. 98/2012
- Moubarak G., Guiot A. et al "Facebook activity of residents and fellows and its impact on the doctor--patient relationship" J Med Ethics, 15 December 2010
No it doesn't, it only means the end of anonymity.
Anonymity is not the same thing as privacy. Anonymity keeps people from knowing what you're doing, and it's a vitally important quality in many settings. But in general we usually want people (at least some people) to know what we're up to, so long as they respect that knowledge. That's what privacy is all about. Anonymity is a terribly blunt instrument for protecting privacy, and it's also fragile. If anonymity was all you have, then you're in deep trouble when someone manages to defeat it.
New information technologies have clearly made anonymity more difficult, yet it does not follow that we must lose our privacy. Instead, these developments bring into stark relief the need for stronger regulatory controls that compel restraint in the way third parties deal with Personal Information that comes into their possession.
A great example is Facebook's use of facial recognition. When Facebook members innocently tag one another in photos, Facebook creates biometric templates with which it then automatically processes all photo data (previously anonymous), looking for matches. This is how they can create tag suggestions, but Facebook is notoriously silent on what other applications it has for facial recognition. Now and then we get a hint, with, for example, news of the Facedeals start up last year. Facedeals accesses Facebook's templates (under conditions that remain unclear) and uses them to spot customers as they enter a store to automatically check them in. It's classic social technology: kinda sexy, kinda creepy, but clearly in breach of Collection, Use and Disclosure privacy principles.
And indeed, European regulators have found that Facebook's facial recognition program is unlawful. The chief problem is that Facebook never properly disclosed to members what goes on when they tag one another, and they never sought consent to create biometric templates with which to subsequently identify people throughout their vast image stockpiles. Facebook has been forced to shut down their facial recognition operations in Europe, and they've destroyed their historical biometric data.
So privacy regulators in many parts of the world have real teeth. They have proven that re-identification of anonymous data by facial recognition is unlawful, and they have managed to stop a very big and powerful company from doing it.
This is how we should look at the implications of the DNA 'hacking'. Indeed, Melissa Gymrek from the Whitehead Institute said in an interview: "I think we really need to learn to deal with the fact that we cannot ever make data sets truly anonymous, and that I think the key will be in regulating how we are allowed to use this genetic data to prevent it from being used maliciously."
Perhaps this episode will bring even more attention to the problem in the USA, and further embolden regulators to enact broader privacy protections there. Perhaps the very extremeness of the DNA hacking does not spell the end of privacy so much as its beginning.