Facial recognition is digital alchemy. It's the prince of data mining.
Facial recognition takes previously anonymous images and conjures peoples' identities. It's an invaluable capability. Once they can pick out faces in crowds, trawling surreptitiously through anyone and everyone's photos, the social network businesses can work out what we're doing, when and where we're doing it, and who we're doing it with. The companies figure out what we like to do without us having to 'like' or favorite anything.
So Google, Facebook, Apple at al have invested hundreds of megabucks in face recognition R&D and buying technology start-ups. And they spend billions of dollars buying images and especially faces, going back to Google's acquisition of Picasa in 2004, and most recently, Facebook's ill-fated $3 billion offer for Snapchat.
But if most people find face recognition rather too creepy, then there is cause for optimism. The technocrats have gone too far. What many of them still don't get is this: If you take anonymous data (in the form of photos) and attach names to that data (which is what Facebook photo tagging does - it guesses who people are in photos are, attaches putative names to records, and invites users to confirm them) then you Collect Personal Information. Around the world, existing pre-biometrics era black letter Privacy Law says you can't Collect PII even indirectly like that without am express reason and without consent.
When automatic facial recognition converts anonymous data into PII, it crosses a bright line in the law.
Exploring new strategic opportunities for CIOs and CISOs.
For as long as we've had a distinct information security profession, it has been said that security needs to be a "business enabler". But what exactly does that mean? How can security professionals advance from their inherently defensive postures, into more strategic positions, and contribute actively to the growth of the business? This is the focus of my latest work at Constellation Research. It turns out that security professionals have special tools and skills ideally suited to a broader strategic role in information management.
The role of Chief Information Security Officer (CISO) is a tough one. Security is red hot. Not a week goes by without news of another security breach.
Information now is the lifeblood of most organisations; CISOs and their teams are obviously crucial in safeguarding that. But a purely defensive mission seldom allows for much creativity, or a positive reputation amongst one's peers. A predominantly reactive work mode -- as important as it is from day to day -- can sometimes seem precarious. The good news for CISOs' career security and job satisfaction is they happen to have special latent skills to innovate and build out those most important digital assets.
Information assets are almost endless: accounts, ledgers and other legal records, sales performance, stock lists, business plans, R&D plans, product designs, market analyses and forecasts, customer data, employee files, audit reports, patent specifications and trade secrets. But what is it about all this information that actually needs protecting? What exactly makes any data valuable? These questions take us into the mind of the CISO.
Security management is formally all about the right balance of Confidentiality, Integrity and Availability in the context of the business. Different businesses have different needs in these three dimensions.
Think of the famous industrial secrets like the recipes for KFC or Coca Cola. These demand the utmost confidentiality and integrity but the availability of the information can be low (nay, must be low) because it is accessed as a whole so seldomly. Medical records too have traditionally needed confidentiality more than availability, but that's changing. Complex modern healthcare demands electronic records, and these do need high availability especially in emergency care settings.
In contrast, for public information like stock prices there is no value in confidentiality whatsoever, and instead, availability and integrity are paramount. On the other hand, market-sensitive information that listed companies periodically report to stock exchanges must have very strict confidentiality for a relatively brief period.
Security professionals routinely compile Information Asset Inventories and plan for appropriate C-I-A for each type of data held. From there, a Threat & Risk Assessment (TRA) is generally undertaken, to examine the adverse events that might compromise the Confidentiality, Integrity and/or Availability. The likelihood and the impact of each adverse event are estimated and multiplied together to gauge the practical risk posed by each known threat. By prioritising counter-measures for the identified threats, in line with the organisation's risk appetite, the TRA helps guide a rational program of investment in security.
Now their practical experience can put CISOs in a special position to enhance their organisation's information assets rather than restrict themselves to hardening information against just the negative impacts.
Here's where the CISO's mindset comes into play in a new way. The real value of information lies not so much in the data itself as in its qualities. Remember the cynical old saw "It's not what you know, it's who you know". There's a serious side to the saying, which highlights that really useful information has pedigree.
So the real action is in the metadata; that is, data about data. It may have got a bad rap recently thanks to surveillance scandals, but various thinkers have long promoted the importance of metadata. For example, back in the 1980s, Citibank CEO Walter Wriston famously said "information about money will become almost as important as money itself". What a visionary endorsement of metadata!
The important latent skills I want to draw out for CISOs is their practiced ability to deal with the qualities of data. To bring greater value to the business, CISOs can start thinking about the broader pedigree of data and not merely its security qualities. They should spread their wings beyond C-I-A, to evaluate all sorts of extra dimensions, like completeness, reliability, originality, currency, privacy and regulatory compliance.
The core strategic questions for the modern CISO are these: What is it about your corporate information that gives you competitive advantage? What exactly makes information valuable?
The CISO has the mindset and the analytical tools to surface these questions and positively engage their executive peers in finding the answers.
My new Constellation Research report will be published soon.
In response to "The Solace of Oblivion", Jeffrey Toobin, The New Yorker, September 29th, 2014.
The "Right to be Forgotten" is an unfortunate misnomer for a balanced data control measure decided by the European Court of Justice. The new rule doesn't seek to erase the past but rather to restore some of its natural distance. Privacy is not about secrecy but moderation. Yet restraint is toxic to today's information magnates, and the response of some to even the slightest throttling of their control of data has been shrill. Google doth protest too much when it complains that having to adjust its already very elaborate search filters makes it an unwilling censor.
The result of a multi-billion dollar R&D program, Google's search engine is a great deal more than a latter-day microfiche. Its artificial intelligence tries to predict what users are really looking for, and as a result, its insights are all the more valuable to Google's real clients -- the advertisers. No search result is a passive reproduction of data from a "public domain". Google makes the public domain public. So if search reveals Personal Information that was hitherto impossible to find, then Google should at least participate in helping to temper the unintended privacy consequences.
October 1, 2014.
I was discussing definitions of Personally Identifiable Information (PII) with some lawyers today, one of whom took exception to the US General Services Administration definition: information that can be used to distinguish or trace an individual’s identity, either alone or when combined with other personal or identifying information that is linked or linkable to a specific individual". This lawyer concluded rather hysterically that under such a definition, "nobody can use the internet without a violation".
Similarly, I've seen engineers in Australia recoil at the possibility that IP and MAC Addresses might be treated as PII because it is increasingly easy to link them to the names of device owners. I was recently asked "Why are they stopping me collecting IP addresses?". The answer is, they're not.
There are a great many misconceptions about privacy, but the idea that 'if it's personal you can't use it' is by far the worst.
Nothing in any broad-based data privacy law I know of says personal information cannot be collected or used.
Rather, what data privacy laws actually say is: if you're collecting and using PII, be careful.
Privacy is about restraint. The general privacy laws of Australia, Europe and 100-odd countries say things like don't collect PII without consent, don't collect PII beyond what you demonstrably need, don't use PII collected for one purpose for other unrelated purposes, tell individuals if you can what PII you hold about them, give people access to the PII you have, and do not retain PII for longer than necessary.
Such rules are entirely reasonable, and impose marginal restrictions on the legitimate conduct of business. And they align very nicely with standard security practice which promotes the Need To Know principle and the Principle of Least Privilege.
Compliance with Privacy Principles does add some overhead to data management compared with anonymous data. If re-identification techniques and ubiquitous inter-connectedness means that hardly any data is going to stay anonymous anymore, then yes, privacy laws mean that data should be treated more cautiously than was previously the case. And what exactly is wrong with that?
If data is the new gold then it's time data custodians took more care.
A repeated refrain of cynics and “infomopolists” alike is that privacy is dead. People are supposed to know that anything on the Internet is up for grabs. In some circles this thinking turns into digital apartheid; some say if you’re so precious about your privacy, just stay offline.
But socialising and privacy are hardly mutually exclusive; we don’t walk around in public with our names tattooed on our foreheads. Why can’t we participate in online social networks in a measured, controlled way without submitting to the operators’ rampant X-ray vision? There is nothing inevitable about trading off privacy for conviviality.
The privacy dangers in Facebook and the like run much deeper than the self-harm done by some peoples’ overly enthusiastic sharing. Promiscuity is actually not the worst problem, neither is the notorious difficulty of navigating complex and ever changing privacy settings.
The advent of facial recognition presents far more serious and subtle privacy challenges.
Facebook has invested heavily in face recognition technology, and not just for fun. Facebook uses it in effect to crowd-source the identification and surveillance of its members. With facial recognition, Facebook is building up detailed pictures of what people do, when, where and with whom.
You can be tagged without consent in a photo taken and uploaded by a total stranger.
The majority of photos uploaded to personal albums over the years were not intended for anything other than private viewing.
Under the privacy law of Australia and data protection regulations in dozens of other jurisdictions, what matters is whether data is personally identifiable. The Commonwealth Privacy Act 1988 (as amended in 2014) defines “Personal Information” as: “information or an opinion about an identified individual, or an individual who is reasonably identifiable”.
Whenever Facebook attaches a member’s name to a photo, they are converting hitherto anonymous data into Personal Information, and in so doing, they become subject to privacy law. Automated facial recognition represents an indirect collection of Personal Information. However too many people still underestimate the privacy implications; some technologists naively claim that faces are “public” and that people can have no expectation of privacy in their facial images, ignoring that information privacy as explained is about the identifiability and identification of data; the words “public” and “private” don’t even figure in the Privacy Act!
If a government was stealing into our photo albums, labeling people and profiling them, there would be riots. It's high time that private sector surveillance - for profit - is seen for what it is, and stopped.
Ed Snowden was interviewed today as part of the New Yorker festival. This TechCruch report says Snowden "was asked a couple of variants on the question of what we can do to protect our privacy. His first answer called for a reform of government policies." He went on to add some remarks about Google, Facebook and encryption and that's what the report chose to focus on. The TechCrunch headline: "Snowden's Privacy Tips".
Mainstream and even technology media reportage does Snowden a terrible disservice and takes the pressure off from government policy.
I've listened to the New Yorker online interview. After being asked by a listener what they should do about privacy, Snowden gave a careful, nuanced, and comprehensive answer over five minutes. His very first line was this is an incredibly complex topic and he did well to stick to plain language throughout. He canvassed a great many issues including: the need for policy reform, the 'Nothing to Hide' argument, the inversion of civil rights when governments ask us to justify the right to be left alone, the collusion of companies and governments, the poor state of product security and usability, the chilling effect on industry of government intervention in security, metadata, and the radicalisation of computer scientists today being comparable with physicists in the Cold War.
Only after all that, and a follow up question about 'ordinary people', did Snowden say 'don't use Dropbox'.
Consistently, when Snowden is asked what to do about privacy, his answers are primarily about politics not technology. When pressed, he dispenses the odd advice about using Tor and disk encryption, but Snowden's chief concerns (as I have discussed in depth previously) are around accountability, government transparency, better cryptology research, better security product quality, and so on. He is no hacker.
I am simply dismayed how Snowden's sophisticated analyses are dumbed down to security tips. He has never been a "cyber Agony Aunt". The proper response to NSA overreach has to be agitation for regime change, not do-it-yourself cryptography. That is Snowden's message.
Tonight, Australian Broadcasting Corporation’s Four Corners program aired a terrific special, "Privacy Lost" written and produced by Martin Smith from the US public broadcaster PBS’s Frontline program.
Here we have a compelling demonstration of the importance and primacy of Collection Limitation for protecting our privacy.
UPDATE: The program we saw in Australia turns out to be a condensed version of PBS's two part The United States of Secrets from May 2014.
About the program
Martin Smith summarises brilliantly what we know about the NSA’s secret surveillance programs, thanks to the revelations of Ed Snowden, the Guardian’s Glenn Greenwald and the Washington Post’s Barton Gellman; he holds many additional interviews with Julia Angwin (author of “Dragnet Nation”), Chris Hoofnagle (UC Berkeley), Steven Levy (Wired), Christopher Soghoian (ACLU) and Tim Wu (“The Master Switch”), to name a few. Even if you’re thoroughly familiar with the Snowden story, I highly recommend “Privacy Lost” or the original "United States of Secrets" (which unlike the Four Corners edition can be streamed online).
The program is a ripping re-telling of Snowden’s expose, against the backdrop of George W. Bush’s PATRIOT Act and the mounting suspicions through the noughties of NSA over-reach. There are freshly told accounts of the intrigues, of secret optic fibre splitters installed very early on in AT&T’s facilities, scandals over National Security Letters, and the very rare case of the web hosting company Calyx who challenged their constitutionality (and yet today, with the letter withdrawn, remains unable to tell us what the FBI was seeking). The real theme of Smith’s take on surveillance then emerges, when he looks at the rise of data-driven businesses -- first with search, then advertising, and most recently social networking -- and the “data wars” between Google, Facebook and Microsoft.
In my view, the interplay between government surveillance and digital businesses is the most important part of the Snowden epic, and it receives the proper emphasis here. The depth and breadth of surveillance conducted by the private sector, and the insights revealed about what people might be up to creates irresistible opportunities for the intelligence agencies. Hoofnagle tells us how the FBI loves Facebook. And we see the discovery of how the NSA exploits the tracking that’s done by the ad companies, most notably Google’s “PREF” cookie.
One of the peak moments in “Privacy Lost” comes when Gellman and his specialist colleague Ashkan Soltani present their evidence about the PREF cookie to Google – offering an opportunity for the company to comment before the story is to break in the Washington Post. The article ran on December 13, 2013; we're told it was then the true depth of the privacy problem was revealed.
My point of view
Smith takes as a given that excessive intrusion into private affairs is wrong, without getting into the technical aspects of privacy (such as frameworks for data protection, and various Privacy Principles). Neither does he unpack the actual privacy harms. And that’s fine -- a TV program is not the right place to canvass such technical arguments.
When Gellman and Soltani reveal that the NSA is using Google’s tracking cookie, the government gets joined irrefutably to the private sector in a mass surveillance apparatus. And yet I am not sure the harm is dramatically worse when the government knows what Facebook and Google already know.
Privacy harms are tricky to work out. Yet obviously no harm can come from abusing Personal Information if that information is not collected in the first place! I take away from “Privacy Lost” a clear impression of the risks created by the data wars. We are imperiled by the voracious appetite of digital businesses that hang on indefinitely to masses of data about us, while they figure out ever cleverer ways to make money out of it. This is why Collection Limitation is the first and foremost privacy protection. If a business or government doesn't have a sound and transparent reason for having Personal Information about us, then they should not have it. It’s as simple as that.
Martin Smith has highlighted the symbiosis between government and private sector surveillance. The data wars not only made dozens of billionaires but they did much of the heavy lifting for the NSA. And this situation is about to get radically more fraught. On the brink of the Internet of Things, we need to question if we want to keep drowning in data.
The "Right to be Forgotten" debate reminds me once again of the cultural differences between technology and privacy.
On September 30, I was honoured to be part of a panel discussion hosted by the IEEE on RTBF; a recording can be viewed here. In a nutshell, the European Court of Justice has decided that European citizens have the right to ask search engine businesses to suppress links to personal information, under certain circumstances. I've analysed and defended the aims of the ECJ in another blog.
One of the IEEE talking points was why RTBF has attracted so much scorn. My answer was that some critics appear to expect perfection in the law; when they look at the RTBF decision, all they see is problems. Yet nobody thinks this or any law is perfect; the question is whether it helps improve the balance of rights in a complex and fast changing world.
It's a little odd that technologists in particular are so critical of imperfections in the law, when they know how flawed is technology. Indeed, the security profession is almost entirely concerned with patching problems, and reminding us there will never be perfect security.
Of course there will be unwanted side-effects of the new RTBF rules and we should trust that over time these will be reviewed and dealt with. I wish that privacy critics could be more humble about this unfolding environment. I note that when social conservatives complain about online pornography, or when police decry encryption as a tool of criminals, technologists typically play those problems down as the unintended consequences of new technologies, which on average overwhelmingly do good not evil.
And it's the same with the law. It really shouldn't be necessary to remind anyone that laws have unintended consequences, for they are the stuff of the entire genre of courtroom drama. So everyone take heart: the good guys nearly always win in the end.
The European Court of Justice recently ruled on the so-called "Right to be Forgotten" granting members of the public limited rights to request that search engines like Google suppress links to Personal Information under some circumstances. The decision has been roundly criticised by technologists and by American libertarians -- acting out the now familiar ritualised privacy arguments around human rights, freedom of speech, free market forces and freedom to innovate (and hence the bad pun in the title of this article). Surprisingly even some privacy advocates like Jules Polonetsky (quoted in The New Yorker) has a problem with the ECJ judgement because he seems to think it's extremist.
Of the various objections, the one I want to answer here is that search engines should not have to censor "facts" retrieved from the "public domain".
On September 30, I am participating in a live panel discussion of the Right To Be Forgotten, hosted by the IEEE; you can register here and download a video recording of the session later.
Update: recording now available here.
In an address on August 18, the European Union's Justice Commissioner Martine Reicherts made the following points about the Right to be Forgotten (RTBF):
- "[The European Court of Justice] said that individuals have the right to ask companies operating search engines to remove links with personal information about them -- under certain conditions. This applies when information is inaccurate, for example, or inadequate, irrelevant, outdated or excessive for the purposes of data processing. The Court explicitly ruled that the right to be forgotten is not absolute, but that it will always need to be balanced against other fundamental rights, such as the freedom of expression and the freedom of the media -- which, by the way, are not absolute rights either".
In the current (September 29, 2014) issue of New Yorker, senior legal analyst Jeffrey Toobin looks at RTBF in the article "The Solace of Oblivion". It's a balanced review of a complex issue, which acknowledges the transatlantic polarization of privacy rights and freedom of speech.
Toobin interviewed Kent Walker, Google's general counsel. Walker said Google likes to think of itself as a "card catalogue": "We don't create the information. We make it accessible. A decision like [the ECJ's], which makes us decide what goes inside the card catalogue, forces us into a role we don't want."
But there's a great deal more to search than Walker lets on.
Google certainly does create fresh Personal Information, and in stupendous quantities. Their search engine is the bedrock of a hundred billion dollar business, founded on a mission to "organize the world's information". Google search is an incredible machine, the result of one of the world's biggest ever and ongoing software R&D projects. Few of us now can imagine life without Internet search and instant access to limitless information that would otherwise be utterly invisible. Search really is magic – just as Arthur C. Clarke said any sufficiently advanced technology would be.
On its face therefore, no search result is a passive reproduction of data from a "public domain". Google makes the public domain public.
But while search is free, it is hyper profitable, for the whole point of it is to underpin a gigantic advertising business. The search engine might not create the raw facts and figures in response to our queries, but it covertly creates and collects symbiotic metadata, complicating the picture. Google monitors our search histories, interests, reactions and habits, as well as details of the devices we're using, when and where and even how we are using them, all in order to divine our deep predilections. These insights are then provided in various ways to Google's paying customers (advertisers) and are also fed back into the search engine, to continuously tune it. The things we see courtesy of Google are shaped not only by their page ranking metrics but also by the company's knowledge of our preferences (which it forms by watching us across the whole portfolio of search, Gmail, maps, YouTube, and the Google+ social network). When we search for something, Google tries to predict what we really want to know.
In the modern vernacular, Google hacks the public domain.
The collection and monetization of personal metadata is inextricably linked to the machinery of search. The information Google serves up to us is shaped and transformed to such an extent, in the service of Google's business objectives, that it should be regarded as synthetic and therefore the responsibility of the company. Their search algorithms are famously secret, putting them beyond peer review; nevertheless, there is a whole body of academic work now on the subtle and untoward influences that Google exerts as it filters and shapes the version of reality it thinks we need to see.
Some objections to the RTBF ruling see it as censorship, or meddling with the "truth". But what exactly is the state of the truth that Google purportedly serves up? Search results are influenced by many arbitrary factors of Google's choosing; we don't know what those factors are, but they are dictated by Google's business interests. So in principle, why is an individual's interests in having some influence over search results any less worthy than Google's? The "right to be forgotten" is an unfortunate misnomer: it is really more of a 'limited right to have search results filtered differently'.
If Google's machinery reveals Personal Information that was hitherto impossible to find, then why shouldn't it at least participate in protecting the interests of the people affected? I don't deny that modern technology and hyper-connectivity creates new challenges for the law, and that traditional notions of privacy may be shifting. But it's not a step-change, and in the meantime, we need to tread carefully. There are as many unintended consequences and problems in the new technology as there are in the established laws. The powerful owners of benefactors of these technologies should accept some responsibility for the privacy impacts. With its talents and resources, Google could rise to the challenge of better managing privacy, instead of pleading that it's not their problem.
Another week, another security collaboration launch!
"Simply Secure" calls itself “a small but growing organization [with] expertise in usability research, design, software development, and product management". Their mission has to do with improving the security functions that built-in so badly in most software today. Simply Secure is backed by Google and Dropbox, and supported by a diverse advisory board.
It's early days (actually early day, singular) so it might be churlish to point out that Simply Secure's strategic messaging is a little uneven ... except that the words being used to describe it shed light on the clarity of the thinking.
My first exposure to Simply Secure came last night, when I read an article in the Guardian by Cory Doctorow (who is one of their advisers). Doctorow places enormous emphasis on privacy; the word “privacy" outnumbers “security" 16 to three in the body of his column. Another admittedly shorter report about the launch by The Next Web doesn't mention privacy at all. And then there's the Simply Secure blog post, which cites privacy a great deal but every single time in conjunction with security, as in “security and privacy". That repeated phrasing conveys, to me at least, some discomfort. As I say, it's early days and the team is doubtless sorting out how to weigh and progress these closely related objectives.
But I hope they do it quickly. On the face of it, Simply Secure might only scratch the surface of privacy.
Doctorow's Guardian article is mostly concerned with encryption and the terrible implementations that have plagued us since the dawn of the Internet. It's definitely important that we improve here – and radically. If the Simply Secure initiative does nothing but make encryption easier to integrate into commodity software, that would be a great thing. I'm all for it. But it won't necessarily or even probably lead to better privacy, because privacy is about restraint not secrecy or anonymity.
As we go about our lives, we actually want to be known by others, but we want those who know us to be restrained in what they do with the knowledge they have about us. Privacy is the protection you need when your affairs are not secret.
I know Doctorow knows this – I've seen his terrific little speech on the steps on Comic-Con about PRISM. So I'm confused by his focus on cryptography.
How far does encryption get us? If we're using social networks, or if we're shopping and opting in to loyalty programs or selected targeted marketing, or if we're sharing our medical records with relatives, medicos, hospitals and researchers, then encryption becomes moot. We need mechanisms to restrain what the receivers of our personal information do with it. We all know the business model at work behind “free" online services; using encryption to protect privacy in social networking for instance would be like using an armoured van to deliver your valuables to Bernie Madoff.
Another limitation of user-centric or user-managed encryption has to do with Big Data. A great deal of personal information about us is created and collected unseen behind our backs, by sensors, and by analytics processes than manage to work out who we are by linking disparate data streams together. How could SS ameliorate those sorts of problems? If the SS vision includes encryption at rest as well as in transit, then how will the user control or even see all the secondary uses of their encrypted personal information?
There's a combativeness in Doctorow's explanation of Simply Secure and his tweets from yesterday on the topic. His aim is expressly to thwart the surveillance state, which in his view includes a symbiosis (if not conspiracy) between government and internet companies, where the former gets their dirty work done by the latter. I'm sure he and I both find that abhorrent in equal measure. But I argue the proper response to these egregious behaviours is political not technological (and political in the broad sense; I love that Snowden talks as much about accountability, legal processes, transparency and research as he does about encryption). If you think the government is exploiting the exploiters, then DIY encryption is a pretty narrow counter-measure. This is not the sort of society we want to live in, so let's work to change the establishment, rather than try to take it on in a crypto shoot-out.
Yes security technology is important but it's not nearly as important for privacy as the Rule of Law. Data privacy regimes instil restraint. The majority of businesses come to know that they are not at liberty to over-collect personal information, nor to re-use personal information unexpectedly and without consent. A minority of organisations flout data privacy principles, for example by slyly refining raw data into valuable personal knowledge, exploiting the trust citizens and users put in them. Some of these outfits flourish in the United States – the Canary Islands of privacy. Worldwide, the policing of privacy is patchy indeed, yet there have been spectacular legal victories in Europe and elsewhere against the excessive practices of really big companies like Facebook with their biometric data mining of photo albums, and Google's drift net-like harvesting of traffic from unencrypted Wi-Fi networks.
Pragmatically, I'm afraid encryption is such a fragile privacy measure. Once secrecy is penetrated, we need regulations to stem exploitation of our personal information.
By all means, let's improve cryptographic engineering and I wish the Simply Secure initiative all the best. So long as they don't call security privacy.