Lockstep

Mobile: +61 (0) 414 488 851
Email: swilson@lockstep.com.au

Too smart?

An Engadget report today, "Hangouts eavesdrops on your chats to offer 'smart suggestions'" describes a new "spy/valet" feature being added to Google's popular video chat tool.

  • "Google's Hangouts is gaining a handy, but slightly creepy new feature today. The popular chat app will now act as a digital spy-slash-valet by eavesdropping on your conversations to offer 'smart suggestions.' For instance, if a pal asks 'where are you?' it'll immediately prompt you to share your location, then open a map so you can pin it precisely."

It's sad that this sort of thing still gets meekly labeled as "creepy". The privacy implications are serious and pretty easy to see.

Google is evidently processing the text of Hangouts as they fly through their system, extracting linguistic cues, interpreting what's being said using Artificial Intelligence, extracting new meaning and insights, and offering suggestions.

We need some clarification about whether any covert tests of this technology have been undertaken during the R&D phase. A company obviously doesn't launch a new product like this without a lot of research, feasibility testing, prototyping and testing. Serious work on 'smart suggestions' would not start without first testing how it works in real life. So I wonder if any of this evaluation was done covertly on live data? Are Google researchers routinely eavesdropping on hangouts to develop the 'smart suggestions' technology?

If so, is such data usage covered by their Privacy Policy (you know, under the usual "in order to improve our service" justification)? And is usage sanctioned internationally in the stricter privacy regimes?

Many people have said to me I'm jumping the gun, and that Google would probably test the new Hangouts feature on its own employees. Perhaps, but given that scanning gmails is situation normal for Google, and they have a "privacy" culture that joins up all their business units so that data may be re-purposed almsot without limit, I feel sure that running AI algorithms on text without telling people would be par for the course.

In development and in operation, we need to know what steps are taken to protect the privacy of hangout data. What personally identifiable data and metadata is retained for other purposes? Who inside Google is granted access to the data and especially the synthtised insights? How long does any secondary usage persist for? Are particularly sensitive matters (like health data, financial details, corporate intellectual property etc.) filtered out?

This is well beyond "creepy". Hangouts and similar video chat are certainly wonderdful technologies. We're using them routinely for teaching, education, video conferencing, collaboration and consultation. The tools may become entrenched in corporate meetings, telecommuting, healthcare and the professions. But if I am talking with my doctor, or discussing patents with my legal team, or having a clandestine chat with a lover, I clearly do not want any unsolicited contributions from the service provider. More fundamentally, I want assurance that no machine is ever tapping into these sorts of communications, running AI algorithms, and creating new insights. If I'm wrong about covert testing on live data, then Google could do what Apple did and publish an Open Letter clarifying their data usage practices and strategies.

Come to think of it, if Google is running natural language processing algorithms over the Hangouts stream, might they be augmenting their gmail scanning the same way? Their business model is to extract insights about users from any data they get their hands on. Until now it's been a crude business of picking out keywords and using them to profile users' interests and feed targeted advertising. But what if they could get deeper information about us through AI? Is there any sign from their historical business practices that Google would not do this? And what if they can extract sensitive information like mental health indications? Even with good intent and transarency, predicting healthcare from social media is highly problematic as shown by the "Samaritans Radar" experience.

Artificial Intelligence is one of the new frontiers. Hot on the heels of the successes of IBM Watson, we're seeing Natural Language Processing and analytics rapidly penetrate business and now consumer applications. Commentators are alternately telling us that AI will end humanity, and not to worry about it. For now, I call on people to simply think clearly through the implications, such as for privacy. If AI programs are clever enough to draw deep insights about us from what we say, then the "datapreneurs" in charge of those algorithms need to remember they are just as accountable for privacy as if they have asked us reveal all by filling out a questionnaire.

Posted in Social Networking, Social Media, Privacy, Internet, Big Data

The Prince of Data Mining

Facial recognition is digital alchemy. It's the prince of data mining.

Facial recognition takes previously anonymous images and conjures peoples' identities. It's an invaluable capability. Once they can pick out faces in crowds, trawling surreptitiously through anyone and everyone's photos, the social network businesses can work out what we're doing, when and where we're doing it, and who we're doing it with. The companies figure out what we like to do without us having to 'like' or favorite anything.

So Google, Facebook, Apple at al have invested hundreds of megabucks in face recognition R&D and buying technology start-ups. And they spend billions of dollars buying images and especially faces, going back to Google's acquisition of Picasa in 2004, and most recently, Facebook's ill-fated $3 billion offer for Snapchat.

But if most people find face recognition rather too creepy, then there is cause for optimism. The technocrats have gone too far. What many of them still don't get is this: If you take anonymous data (in the form of photos) and attach names to that data (which is what Facebook photo tagging does - it guesses who people are in photos are, attaches putative names to records, and invites users to confirm them) then you Collect Personal Information. Around the world, existing pre-biometrics era black letter Privacy Law says you can't Collect PII even indirectly like that without am express reason and without consent.

When automatic facial recognition converts anonymous data into PII, it crosses a bright line in the law.

Posted in Social Networking, Privacy, Biometrics, Big Data

Crowd sourcing private sector surveillance

A repeated refrain of cynics and “infomopolists” alike is that privacy is dead. People are supposed to know that anything on the Internet is up for grabs. In some circles this thinking turns into digital apartheid; some say if you’re so precious about your privacy, just stay offline.

But socialising and privacy are hardly mutually exclusive; we don’t walk around in public with our names tattooed on our foreheads. Why can’t we participate in online social networks in a measured, controlled way without submitting to the operators’ rampant X-ray vision? There is nothing inevitable about trading off privacy for conviviality.

The privacy dangers in Facebook and the like run much deeper than the self-harm done by some peoples’ overly enthusiastic sharing. Promiscuity is actually not the worst problem, neither is the notorious difficulty of navigating complex and ever changing privacy settings.

The advent of facial recognition presents far more serious and subtle privacy challenges.

Facebook has invested heavily in face recognition technology, and not just for fun. Facebook uses it in effect to crowd-source the identification and surveillance of its members. With facial recognition, Facebook is building up detailed pictures of what people do, when, where and with whom.

You can be tagged without consent in a photo taken and uploaded by a total stranger.

The majority of photos uploaded to personal albums over the years were not intended for anything other than private viewing.

Under the privacy law of Australia and data protection regulations in dozens of other jurisdictions, what matters is whether data is personally identifiable. The Commonwealth Privacy Act 1988 (as amended in 2014) defines “Personal Information” as: “information or an opinion about an identified individual, or an individual who is reasonably identifiable”.

Whenever Facebook attaches a member’s name to a photo, they are converting hitherto anonymous data into Personal Information, and in so doing, they become subject to privacy law. Automated facial recognition represents an indirect collection of Personal Information. However too many people still underestimate the privacy implications; some technologists naively claim that faces are “public” and that people can have no expectation of privacy in their facial images, ignoring that information privacy as explained is about the identifiability and identification of data; the words “public” and “private” don’t even figure in the Privacy Act!

If a government was stealing into our photo albums, labeling people and profiling them, there would be riots. It's high time that private sector surveillance - for profit - is seen for what it is, and stopped.

Posted in Social Networking, Social Media, Privacy, Biometrics

Four Corners' 'Privacy Lost': A demonstration of the Collection Principle

Tonight, Australian Broadcasting Corporation’s Four Corners program aired a terrific special, "Privacy Lost" written and produced by Martin Smith from the US public broadcaster PBS’s Frontline program.

Here we have a compelling demonstration of the importance and primacy of Collection Limitation for protecting our privacy.

UPDATE: The program we saw in Australia turns out to be a condensed version of PBS's two part The United States of Secrets from May 2014.

About the program

Martin Smith summarises brilliantly what we know about the NSA’s secret surveillance programs, thanks to the revelations of Ed Snowden, the Guardian’s Glenn Greenwald and the Washington Post’s Barton Gellman; he holds many additional interviews with Julia Angwin (author of “Dragnet Nation”), Chris Hoofnagle (UC Berkeley), Steven Levy (Wired), Christopher Soghoian (ACLU) and Tim Wu (“The Master Switch”), to name a few. Even if you’re thoroughly familiar with the Snowden story, I highly recommend “Privacy Lost” or the original "United States of Secrets" (which unlike the Four Corners edition can be streamed online).

The program is a ripping re-telling of Snowden’s expose, against the backdrop of George W. Bush’s PATRIOT Act and the mounting suspicions through the noughties of NSA over-reach. There are freshly told accounts of the intrigues, of secret optic fibre splitters installed very early on in AT&T’s facilities, scandals over National Security Letters, and the very rare case of the web hosting company Calyx who challenged their constitutionality (and yet today, with the letter withdrawn, remains unable to tell us what the FBI was seeking). The real theme of Smith’s take on surveillance then emerges, when he looks at the rise of data-driven businesses -- first with search, then advertising, and most recently social networking -- and the “data wars” between Google, Facebook and Microsoft.

In my view, the interplay between government surveillance and digital businesses is the most important part of the Snowden epic, and it receives the proper emphasis here. The depth and breadth of surveillance conducted by the private sector, and the insights revealed about what people might be up to creates irresistible opportunities for the intelligence agencies. Hoofnagle tells us how the FBI loves Facebook. And we see the discovery of how the NSA exploits the tracking that’s done by the ad companies, most notably Google’s “PREF” cookie.

One of the peak moments in “Privacy Lost” comes when Gellman and his specialist colleague Ashkan Soltani present their evidence about the PREF cookie to Google – offering an opportunity for the company to comment before the story is to break in the Washington Post. The article ran on December 13, 2013; we're told it was then the true depth of the privacy problem was revealed.

My point of view

Smith takes as a given that excessive intrusion into private affairs is wrong, without getting into the technical aspects of privacy (such as frameworks for data protection, and various Privacy Principles). Neither does he unpack the actual privacy harms. And that’s fine -- a TV program is not the right place to canvass such technical arguments.

When Gellman and Soltani reveal that the NSA is using Google’s tracking cookie, the government gets joined irrefutably to the private sector in a mass surveillance apparatus. And yet I am not sure the harm is dramatically worse when the government knows what Facebook and Google already know.

Privacy harms are tricky to work out. Yet obviously no harm can come from abusing Personal Information if that information is not collected in the first place! I take away from “Privacy Lost” a clear impression of the risks created by the data wars. We are imperiled by the voracious appetite of digital businesses that hang on indefinitely to masses of data about us, while they figure out ever cleverer ways to make money out of it. This is why Collection Limitation is the first and foremost privacy protection. If a business or government doesn't have a sound and transparent reason for having Personal Information about us, then they should not have it. It’s as simple as that.

Martin Smith has highlighted the symbiosis between government and private sector surveillance. The data wars not only made dozens of billionaires but they did much of the heavy lifting for the NSA. And this situation is about to get radically more fraught. On the brink of the Internet of Things, we need to question if we want to keep drowning in data.

Posted in Social Networking, Social Media, Security, Privacy, Internet

Schrodinger's Privacy: A Master Class

Master Class: How to Protect Your Customer's Digital Identity and Personal Data

A Social Media Week Sydney event #SMWSydney
Law Lounge, Sydney University Law School
New Law School Building
Eastern Ave, Camperdown
Fri, Sep 26 - 10:00 AM - 11:30 AM

How can you navigate privacy fact and fiction, without the geeks and lawyers boring each other to death?

It's often said that technology has outpaced privacy law. Many digital businesses seem empowered by this brash belief. And so they proceed with apparent impunity to collect and monetise as much Personal Information as they can get their hands on.

But it's a myth!

Some of the biggest corporations in the world, including Google and Facebook, have been forcefully brought to book by privacy regulations. So, we have to ask ourselves:

  • what does privacy law really mean for social media in Australia?
  • is privacy "good for business"?
  • is privacy "not a technology issue"?
  • how can digital businesses navigate fact & fiction, without their geeks and lawyers boring each other to death?

In this Social Media Week Master Class I will:

  • unpack what's "creepy" about certain online practices
  • show how to rate data privacy issues objectively
  • analyse classic misadventures with geolocation, facial recognition, and predicting when shoppers are pregnant
  • critique photo tagging and crowd-sourced surveillance
  • explain why Snapchat is worth more than three billion dollars
  • analyse the regulatory implications of Big Data, Biometrics, Wearables and The Internet of Things.

We couldn't have timed this Master Class better, coming two weeks after the announcement of the Apple Watch, which will figure prominently in the class!

So please come along, for a fun and in-depth a look at social media, digital technology, the law, and decency.

Register here.

About the presenter

Steve Wilson is a technologist, who stumbled into privacy 12 years ago. He rejected those well meaning slogans (like "Privacy Is Good For Business!") and instead dug into the relationships between information technology and information privacy. Now he researches and develops design patterns to help sort out privacy, alongside all the other competing requirements of security, cost, usability and revenue. His latest publications include:

  • "The collision between Big Data and privacy law" due out in October in the Australian Journal of Telecommunications and the Digital Economy.

Posted in Social Networking, Social Media, Privacy, Internet, Biometrics, Big Data

It's not too late for privacy

Have you heard the news? "Privacy is dead!"

The message is urgent. It's often shouted in prominent headlines, with an implied challenge. The new masters of the digital universe urge the masses: C'mon, get with the program! Innovate! Don't be so precious! Don't you grok that Information Wants To Be Free? Old fashioned privacy is holding us back!

The stark choice posited between privacy and digital liberation is rarely examined with much intellectual rigor. Often, "privacy is dead" is just a tired fatalistic response to the latest breach or eye-popping digital development, like facial recognition, or a smartphone's location monitoring. In fact, those who earnestly assert that privacy is over are almost always trying to sell us something, be it sneakers, or a political ideology, or a wanton digital business model.

Is it really too late for privacy? Is the "genie out of the bottle"? Even if we accepted the ridiculous premise that privacy is at odds with progress, no it's not too late, for a couple of reasons. Firstly, the pessimism (or barely disguised commercial opportunism) generally confuses secrecy for privacy. And secondly, frankly, we aint seen nothin yet!

Conflating privacy and secrecy

Technology certainly has laid us bare. Behavioral modeling, facial recognition, Big Data mining, natural language processing and so on have given corporations X-Ray vision into our digital lives. While exhibitionism has been cultivated and normalised by the informopolists, even the most guarded social network users may be defiled by data prospectors who, without consent, upload their contact lists, pore over their photo albums, and mine their shopping histories.

So yes, a great deal about us has leaked out into what some see as an infinitely extended neo-public domain. And yet we can be public and retain our privacy at the same time. Just as we have for centuries of civilised life.

It's true that privacy is a slippery concept. The leading privacy scholar Daniel Solove once observed that "Privacy is a concept in disarray. Nobody can articulate what it means."

Some people seem defeated by privacy's definitional difficulties, yet information privacy is simply framed, and corresponding data protection laws are elegant and readily understood.

Information privacy is basically a state where those who know us are restrained in they do with the knowledge they have about us. Privacy is about respect, and protecting individuals against exploitation. It is not about secrecy or even anonymity. There are few cases where ordinary people really want to be anonymous. We actually want businesses to know - within limits - who we are, where we are, what we've done and what we like ... but we want them to respect what they know, to not share it with others, and to not take advantage of it in unexpected ways. Privacy means that organisations behave as though it's a privilege to know us. Privacy can involve businesses and governments giving up a little bit of power.

Many have come to see privacy as literally a battleground. The grassroots Cryptoparty movement came together around the heady belief that privacy means hiding from the establishment. Cryptoparties teach participants how to use Tor and PGP, and they spread a message of resistance. They take inspiration from the Arab Spring where encryption has of course been vital for the security of protestors and organisers. One Cryptoparty I attended in Sydney opened with tributes from Anonymous, and a number of recorded talks by activists who ranged across a spectrum of political issues like censorship, copyright, national security and Occupy.

I appreciate where they're coming from, for the establishment has always overplayed its security hand, and run roughshod over privacy. Even traditionally moderate Western countries have governments charging like china shop bulls into web filtering and ISP data retention, all in the name of a poorly characterised terrorist threat. When governments show little sympathy for netizenship, and absolutely no understanding of how the web works, it's unsurprising that sections of society take up digital arms in response.

Yet going underground with encryption is a limited privacy stratagem, because do-it-yourself encryption is incompatible with the majority of our digital dealings. The most nefarious and least controlled privacy offences are committed not by government but by Internet companies, large and small. To engage fairly and squarely with businesses, consumers need privacy protections, comparable to the safeguards against unscrupulous merchants we enjoy, uncontroversially, in traditional commerce. There should be reasonable limitations on how our Personally Identifiable Information (PII) is used by all the services we deal with. We need department stores to refrain from extracting health information from our shopping habits, merchants to not use our credit card numbers as customer reference numbers, shopping malls to not track patrons by their mobile phones, and online social networks to not x-ray our photo albums by biometric face recognition.

Encrypting everything we do would only put it beyond reach of the companies we obviously want to deal with. Look for instance at how the cryptoparties are organised. Some cryptoparties manage their bookings via the US event organiser Eventbrite to which attendants have to send a few personal details. So ironically, when registering for a cryptoparty, you can not use encryption!

The central issue is this: going out in public does not neutralise privacy. It never did in the physical world and it shouldn't be the case in cyberspace either. Modern society has long rested on balanced consumer protection regulations to curb the occasional excesses of business and government. Therefore we ought not to respond to online privacy invasions as if the digital economy is a new Wild West. We should not have to hide away if privacy is agreed to mean respecting the PII of customers, users and citizens, and restraining what data custodians do with that precious resource.

Data Mining and Data Refining

We're still in the early days of the social web, and the information innovation has really only just begun. There is incredible value to be extracted from mining the underground rivers of data coursing unseen through cyberspace, and refining that raw material into Personal Information.

Look at what the data prospectors and processors have managed to do already.


  • Facial recognition transforms vast stores of anonymous photos into PII, without consent, and without limitation. Facebook's deployment of biometric technology was covert and especially clever. For years they encouraged users to tag people they knew in photos. It seemed innocent enough but through these fun and games, Facebook was crowd-sourcing the facial recognition templates and calibrating their constantly evolving algorithms, without ever mentioning biometrics in their privacy policy or help pages. Even now Facebook's Data Use Policy is entirely silent on biometric templates and what they allow themselves to do with them.

    It's difficult to overstate the value of facial recognition to businesses like Facebook when they have just one asset: knowledge about their members and users. Combined with image analysis and content addressable graphical memory, facial recognition lets social media companies work out what we're doing, when, where and with whom. I call it piracy. Billions of everyday images have been uploaded over many years by users for ostensiby personal purposes, without any clue that technology would energe to convert those pictures into a commercial resource.

    Third party services like Facedeals are starting to emerge, using Facebook's photo resources for commercial facial recognition in public. And the most recent facial recognition entrepreneurs like Name Tag App boast of scraping images from any "public" photo databases they can find. But as we shall see below, in many parts of the world there are restrictions on leveraging public-facing databases, because there is a legal difference between anonymous data and identified information.

  • Some of the richest stores of raw customer data are aggregated in retailer databases. The UK department store Tesco for example is said to hold more data about British citizens than the government does. For years of course data analysts have combed through shopping history for marketing insights, but their predictive powers are growing rapidly. An infamous example is Target's covert development of methods to identify customers who are pregnant based on their buying habits. Some Big Data practitioners seem so enamoured with their ability to extract secrets from apparently mundane data, they overlook that PII collected indirectly by algorithm is subject to privacy law just as if it was collected directly by questionnaire. Retailers need to remember this as they prepare to exploit their massive loyalty databases into new financial services ventures.
  • Natural Language Processing (NLP) is the secret sauce in Apple's Siri, allowing her to take commands and dictation. Every time you dictate an email or a text message to Siri, Apple gets hold of telecommunications contet that is normally out of bounds to the phone companies. Siri is like a free PA that reports your daily activities back to the secretarial agency. There is no mention at all of Siri in Apple's Privacy Policy despite the limitless collection of intimate personal information.
  • And looking ahead, Google Glass in the privacy stakes will probably surpass both Siri and facial recognition. If actions speak louder than words, imagine the value to Google of seeing through Glass exactly what we do in real time. Digital companies wanting to know our minds won't need us to expressly "like" anything anymore; they'll be able to tell our preferences from our unexpurgated behaviours.

The surprising power of data protection regulations

There's a widespread belief that technology has outstripped privacy law, yet it turns out technology neutral data privacy law copes well with most digital developments. OECD privacy principles (enacted in over 100 countries) and the US FIPPs (Fair Information Practice Principles) require that companies be transarent about what PII they collect and why, limit the ways in which PII is used for unrelated purposes.

Privacy advocates can take heart from several cases where existing privacy regulations have proven effective against some of the informopolies' trespasses. And technologists and cynics who think privacy is hopeless should heed the lessons.


  • Google StreetView cars, while they drive up and down photographing the world, also collect Wi-Fi hub coordinates for use in geo-location services. In 2010 it was discovered that the StreetView software was also collecting unencrypted Wi-Fi network traffic, some of which contained Personal Information like user names and even passwords. Privacy Commissioners in Australia, Japan, Korea, the Netherlands and elsewhere found Google was in breach of their data protection laws. Google explained that the collection was inadverrtant, apologized, and destroyed all the wireless traffic that had been gathered.

    The nature of this privacy offence has confused some commentators and technologists. Some argue that Wi-Fi data in the public domain is not private, and "by definition" (so they like to say) categorically could not be private. Accordingly some believed Google was within its rights to do whatever it liked with such found data. But that reasoning fails to grasp the technicality that Data Protection laws in Europe, Australia and elsewhere do not essentially distinguish “public” from "private". In fact the word “private” doesn’t even appear in Australia’s “Privacy Act”. If data is identifiable, then privacy rights generally attach to it irrespective of how it is collected.

  • Facebook photo tagging was ruled unlawful by European privacy regulators in mid 2012, on the grounds it represents a collection of PII (by the operation of the biometric matching algorithm) without consent. By late 2012 Facebook was forced to shut down facial recognition and tag suggestions in the EU. This was quite a show of force over one of the most powerful companies of the digital age. More recently Facebook has started to re-introduce photo tagging, prompting the German privacy regulator to reaffirm that this use of biometrics is counter to their privacy laws.

It's never too late

So, is it really too late for privacy? Outside the United States at least, established privacy doctrine and consumer protections have taken technocrats by surprise. They have found, perhaps counter intuitively, that they are not as free as they thought to exploit all personal data that comes their way.

Privacy is not threatened so much by technology as it is by sloppy thinking and, I'm afraid, by wishful thinking on the part of some vested interests. Privacy and anonymity, on close reflection, are not the same thing, and we shouldn't want them to be! It's clearly important to be known by others in a civilised society, and it's equally important that those who do know us, are reasonably restrained in how they use that knowledge.

Posted in Social Networking, Social Media, Privacy

Facebook's lab rats

It's long been said that if you're getting something for free online, then you're not the customer, you're the product. It's a reference to the one-sided bargain for personal information that powers so many social businesses - the way that "infomopolies" as I call them exploit the knowledge they accumulate about us.

Now it's been revealed that we're even lower than product: we're lab rats.

Facebook data scientist Adam Kramer, with collaborators from UCSF and Cornell, this week reported on a study in which they tested how Facebook users respond psychologically to alternatively positive and negative posts. Their experimental technique is at once ingenious and shocking. They took the real life posts of nearly 700,000 Facebook members, and manipulated them, turning them slightly up- or down-beat. And then Kramer at al measured the emotional tone in how people reading those posts reacted in their own feeds. See Experimental evidence of massive-scale emotional contagion through social networks, Adam Kramer,Jamie Guillory & Jeffrey Hancock, in Proceedings of the National Academy of Sciences, v111.24, 17 June 2014.

The resulting scandal has been well-reported by many, including Kashmir Hill in Forbes, whose blog post nicely covers how the affair has unfolded, and includes a response by Adam Kramer himself.

Plenty has been written already about the dodgy (or non-existent) ethics approval, and the entirely contemptible claim that users gave "informed consent" to have their data "used" for research in this way. I draw attention to the fact that consent forms in properly constituted human research experiments are famously thick. They go to great pains to explain what's going on, the possible side effects and potential adverse consequences. The aim of a consent form is to leave the experimental subject in no doubt whatsoever as to what they're signing up for. Contrast this with the Facebook Experiment where they claim informed consent was represented by a fragment of one sentence buried in thousands of words of the data usage agreement. And Kash Hill even proved that the agreement was modified after the experiment started! These are not the actions of researchers with any genuine interest in informed consent.

I was also struck by Adam Kramer's unvarnished description of their motives. His response to the furore (provided by Hill in her blog) is, as she puts it, tone deaf. Kramer makes no attempt whatsoever at a serious scientific justification for this experiment:

  • "The reason we did this research is because we care about the emotional impact of Facebook and the people that use our product ... [We] were concerned that exposure to friends’ negativity might lead people to avoid visiting Facebook.

That is, this large scale psychological experiment was simply for product development.

Some apologists for Facebook countered that social network feeds are manipulated all the time, notably by advertisers, to produce emotional responses.

Now that's interesting, because for their A-B experiment, Kramer and his colleagues took great pains to make sure the subjects were unaware of the manipulation. After all, the results would be meaningless if people knew what they were reading had been emotionally fiddled with.

In contrast, the ad industry has always insisted that today's digital consumers are super savvy, and they know the difference between advertising and real-life. Yet the foundation of the Facebook experiment is that users are unaware of how their online experience is being manipulated. The ad industry's illogical propaganda [advertising is just harmless fun, consumers can spot the ads, they're not really affected by ads all that much ... Hey, with a minute] has only been further exposed by the Facebook Experiment.

Advertising companies and Social Networks are increasingly expert at covertly manipulating perceptions, and now they have the data, collected dishonestly, to prove it.

Posted in Social Networking, Social Media, Science, Privacy, Internet, Culture

Three billion was a Snap

The latest Snowden revelations include the NSA's special programs for extracting photos and identifying from the Internet. Amongst other things the NSA uses their vast information resources to correlate location cues in photos -- buildings, streets and so on -- with satellite data, to work out where people are. They even search especially for passport photos, because these are better fodder for facial recognition algorithms. The audacity of these government surveillance activities continues to surprise us, and their secrecy is abhorrent.

Yet an ever greater scale of private sector surveillance has been going on for years in social media. With great pride, Facebook recently revealed its R&D in facial recognition. They showcased the brazenly named "DeepFace" biometric algorithm, which is claimed to be 97% accurate in recognising faces from regular images. Facebook has made a swaggering big investment in biometrics.

Data mining needs raw material, there's lots of it out there, and Facebook has been supremely clever at attracting it. It's been suggested that 20% of all photos now taken end up in Facebook. Even three years ago, Facebook held 10,000 times as many photographs as the Library of Congress:

Largest photo libraries
[Picture courtesy of the now retired 1000memories.com blog]

And Facebook will spend big buying other photo lodes. Last year they tried to buy Snapchat for the spectacular sum of three billion dollars. The figure had pundits reeling. How could a start-up company with 30 people be worth so much? All the usual dot com comparisons were made; the offer seemed a flight of fancy.

But no, the offer was a rational consideration for the precious raw material that lies buried in photo data.

Snapchat generates at least 100 million new images every day. Three billion dollars was, pardon me, a snap. I figure that at a ballpark internal rate of return of 10%, a $3B investment is equivalent to $300M p.a. so even if the Snapchat volume stopped growing, Facebook would have been paying one cent for every new snap, in perpetuity.

These days, we have learned from Snowden and the NSA that communications metadata is just as valuable as the content of our emails and phone calls. So remember that it's the same with photos. Each digital photo comes from a device that embeds within the image metadata usually including the time and place of when the picture was taken. And of course each Instagram or Snapchat is a social post, sent by an account holder with a history and rich context in which the image yields intimate real time information about what they're doing, when and where.

The hallmark of the Snapchat service is transience: all those snaps are supposed to flit from one screen to another before vaporising. Now of course that idea is contestable; enthusiasts worked out pretty quickly how to retrieve snaps from old memory. And in any case, transience is a red herring, perhaps a deliberate distraction, because the metadata matters more, and Snapchat admits in its Privacy Policy that it pretty well keeps the lot:

  • When you access or use our Services, we automatically collect information about you, including:
  • Usage Information: When you send or receive messages via our Services, we collect information about these messages, including the time, date, sender and recipient of the Snap. We also collect information about the number of messages sent and received between you and your friends and which friends you exchange messages with most frequently.
  • Log Information: We log information about your use of our websites, including your browser type and language, access times, pages viewed, your IP address and the website you visited before navigating to our websites.
  • Device Information: We may collect information about the computer or device you use to access our Services, including the hardware model, operating system and version, MAC address, unique device identifier, phone number, International Mobile Equipment Identity ("IMEI") and mobile network information. In addition, the Services may access your device's native phone book and image storage applications, with your consent, to facilitate your use of certain features of the Services.
  • Location Information: With your consent, we may collect information about the location of your device to facilitate your use of certain features of our Services, determine the speed at which your device is traveling, add location-based filters to your Snaps (such as local weather), and for any other purpose described in this privacy policy.

Snapchat goes on to declare it may use any of this information to "personalize and improve the Services and provide advertisements, content or features that match user profiles or interests" and it reserves the right to share any information with "vendors, consultants and other service providers who need access to such information to carry out work on our behalf".

So back to the data mining: nothing stops Snapchat -- or a new parent company -- running biometric facial recognition over the snaps as they pass through the servers, to extract additional "profile" information. And there's an extra kicker that makes Snapchats extra valuable for biometric data miners. The vast majority of Snapchats are selfies. So if you extract a biometric template from a snap, you already know who it belongs to, without anyone having to tag it. Snapchat would provide a hundred million auto-calibrations every day for facial recognition algorithms! On Facebook, the privacy aware turn off photo tagging, but with Snapchats, self identification is inherent to the experience and is unlikely to be ever be disabled.

NSA has all your selfies

As I've discussed before, the morbid thrill of Snowden's spying revelations has tended to overshadow his sober observations that when surveillance by the state is probably inevitable, we need to be discussing accountability.

While we're all ventilating about the NSA, it's time we also attended to private sector spying and properly debated the restraints that may be appropriate on corporate exploitation of social data.

Personally I'm much more worried that an infomopoly has all my selfies.

Have a disruptive technology implementation story? Get recognised for your leadership. Apply for the 2014 SuperNova Awards for leaders in disruptive technology.

Posted in Social Networking, Social Media, Privacy, Biometrics, Big Data

The strengths and weaknesses of Data Privacy in the Age of Big Data

This is the abstract of a current privacy conference proposal.

Synopsis

Many Big Data and online businesses proceed on a naive assumption that data in the "public domain" is up for grabs; technocrats are often surprised that conventional data protection laws can be interpreted to cover the extraction of PII from raw data. On the other hand, orthodox privacy frameworks don't cater for the way PII can be created in future from raw data collected today. This presentation will bridge the conceptual gap between data analytics and privacy, and offer new dynamic consent models to civilize the trade in PII for goods and services.

Abstract

It’s often said that technology has outpaced privacy law, yet by and large that's just not the case. Technology has certainly outpaced decency, with Big Data and biometrics in particular becoming increasingly invasive. However OECD data privacy principles set out over thirty years ago still serve us well. Outside the US, rights-based privacy law has proven effective against today's technocrats' most worrying business practices, based as they are on taking liberties with any data that comes their way. To borrow from Niels Bohr, technologists who are not surprised by data privacy have probably not understood it.

The cornerstone of data privacy in most places is the Collection Limitation principle, which holds that organizations should not collect Personally Identifiable Information beyond their express needs. It is the conceptual cousin of security's core Need-to-Know Principle, and the best starting point for Privacy-by-Design. The Collection Limitation principle is technology neutral and thus blind to the manner of collection. Whether PII is collected directly by questionnaire or indirectly via biometric facial recognition or data mining, data privacy laws apply.

It’s not for nothing we refer to "data mining". But few of those unlicensed data gold diggers seem to understand that the synthesis of fresh PII from raw data (including the identification of anonymous records like photos) is merely another form of collection. The real challenge in Big Data is that we don’t know what we’ll find as we refine the techniques. With the best will in the world, it is hard to disclose in a conventional Privacy Policy what PII might be collected through synthesis down the track. The age of Big Data demands a new privacy compact between organisations and individuals. High minded organizations will promise to keep people abreast of new collections and will offer ways to opt in, and out and in again, as the PII-for-service bargain continues to evolve.

Posted in Social Networking, Privacy, Biometrics, Big Data

The Snapchat data breach

Yesterday it was reported by The Verge that anonymous hackers have accessed Snapchat's user database and posted 4.6 million user names and phone numbers. In an apparent effort to soften the blow, two digits of the phone numbers were redacted. So we might assume this is a "white hat" exercise, designed to shame Snapchat into improving their security. Indeed, a few days ago Snapchat themselves said they had been warned of vulnerabilities in their APIs that would allow a mass upload of user records.

The response of many has been, well, so what? Some people have casually likened Snapchat's list to a public White Pages; others have played it down as "just email addresses".

Let's look more closely. The leaked list was not in fact public names and phone numbers; it was user names and phone numbers. User names might often be email addresses but these are typically aliases; people frequently choose email addresses that reveal little or nothing of their real world identity. We should assume there is intent in an obscure email address for the individual to remain secret.

Identity theft has become a highly organised criminal enterprise. Crime gangs patiently acquire multiple data sets over many months, sometimes years, gradually piecing together detailed personal profiles. It's been shown time and time again by privacy researchers (perhaps most notably Latanya Sweeney) that re-identification is enabled by linking diverse data sets. And for this purpose, email addresses and phone numbers are superbly valuable indices for correlating an individual's various records. Your email address is common across most of your social media registrations. And your phone number allows your real name and street address to be looked up from reverse White Pages. So the Snapchat breach could be used to join aliases or email addresses to real names and addresses via the phone numbers. For a social engineering attack on a call centre -- or even to open a new bank account -- an identity thief can go an awful long way with real name, street address, email address and phone number.

I was asked in an interview to compare the theft of stolen phone numbers with social security numbers. I surprised the interviewer when I said phone numbers are probably even more valuable to the highly organised ID thief, for they can be used to index names in public directories, and to link different data sets, in ways that SSNs (or credit card numbers for that matter) cannot.

So let us start to treat all personal inormation -- especially when aggregated in bulk -- more seriously! And let's be more cautious in the way we categorise personal or Personally Identifiable Information (PII).

Importantly, most regulatory definitions of PII already embody the proper degree of caution. Look carefully at the US government definition of Personally Identifiable Information:

  • information that can be used to distinguish or trace an individual’s identity, either alone or when combined with other personal or identifying information that is linked or linkable to a specific individual (underline added).

This means that items of data can constitute PII if other data can be combined to identify the person concerned. That is, the fragments are regarded as PII even if it is the whole that does the identifying.

And remember that the middle I in PII stands for Identifiable, and not, as many people presume, Identifying. To meet the definition of PII, data need not uniquely identify a person, it merely needs to be directly or indirectly identifiable with a person. And this is how it should be when we heed the way information technologies enable identification through linkages.

Almost anywhere else in the world, data stores like Snapchat's would automatically fall under data protection and information privacy laws; regulators would take a close look at whether the company had complied with the OECD Privacy Principles, and whether Snapchat's security measures were fit for purpose given the PII concerned. But in the USA, companies and commentators alike still have trouble working out how serious these breaches are. Each new breach is treated in an ad hoc manner, often with people finessing the difference between credit card numbers -- as in the recent Target breach -- and "mere" email addresses like those in the Snapchat and Epsilon episodes.

Surely the time has come to simply give proper regulatory protection to all PII.

Posted in Social Networking, Social Media, Security, Privacy, Identity, Fraud, Big Data