An Engadget report today, "Hangouts eavesdrops on your chats to offer 'smart suggestions'" describes a new "spy/valet" feature being added to Google's popular video chat tool.
- "Google's Hangouts is gaining a handy, but slightly creepy new feature today. The popular chat app will now act as a digital spy-slash-valet by eavesdropping on your conversations to offer 'smart suggestions.' For instance, if a pal asks 'where are you?' it'll immediately prompt you to share your location, then open a map so you can pin it precisely."
It's sad that this sort of thing still gets meekly labeled as "creepy". The privacy implications are serious and pretty easy to see.
Google is evidently processing the text of Hangouts as they fly through their system, extracting linguistic cues, interpreting what's being said using Artificial Intelligence, extracting new meaning and insights, and offering suggestions.
We need some clarification about whether any covert tests of this technology have been undertaken during the R&D phase. A company obviously doesn't launch a new product like this without a lot of research, feasibility testing, prototyping and testing. Serious work on 'smart suggestions' would not start without first testing how it works in real life. So I wonder if any of this evaluation was done covertly on live data? Are Google researchers routinely eavesdropping on hangouts to develop the 'smart suggestions' technology?
Many people have said to me I'm jumping the gun, and that Google would probably test the new Hangouts feature on its own employees. Perhaps, but given that scanning gmails is situation normal for Google, and they have a "privacy" culture that joins up all their business units so that data may be re-purposed almsot without limit, I feel sure that running AI algorithms on text without telling people would be par for the course.
In development and in operation, we need to know what steps are taken to protect the privacy of hangout data. What personally identifiable data and metadata is retained for other purposes? Who inside Google is granted access to the data and especially the synthtised insights? How long does any secondary usage persist for? Are particularly sensitive matters (like health data, financial details, corporate intellectual property etc.) filtered out?
This is well beyond "creepy". Hangouts and similar video chat are certainly wonderdful technologies. We're using them routinely for teaching, education, video conferencing, collaboration and consultation. The tools may become entrenched in corporate meetings, telecommuting, healthcare and the professions. But if I am talking with my doctor, or discussing patents with my legal team, or having a clandestine chat with a lover, I clearly do not want any unsolicited contributions from the service provider. More fundamentally, I want assurance that no machine is ever tapping into these sorts of communications, running AI algorithms, and creating new insights. If I'm wrong about covert testing on live data, then Google could do what Apple did and publish an Open Letter clarifying their data usage practices and strategies.
Come to think of it, if Google is running natural language processing algorithms over the Hangouts stream, might they be augmenting their gmail scanning the same way? Their business model is to extract insights about users from any data they get their hands on. Until now it's been a crude business of picking out keywords and using them to profile users' interests and feed targeted advertising. But what if they could get deeper information about us through AI? Is there any sign from their historical business practices that Google would not do this? And what if they can extract sensitive information like mental health indications? Even with good intent and transarency, predicting healthcare from social media is highly problematic as shown by the "Samaritans Radar" experience.
Artificial Intelligence is one of the new frontiers. Hot on the heels of the successes of IBM Watson, we're seeing Natural Language Processing and analytics rapidly penetrate business and now consumer applications. Commentators are alternately telling us that AI will end humanity, and not to worry about it. For now, I call on people to simply think clearly through the implications, such as for privacy. If AI programs are clever enough to draw deep insights about us from what we say, then the "datapreneurs" in charge of those algorithms need to remember they are just as accountable for privacy as if they have asked us reveal all by filling out a questionnaire.
In my last blog Improving the Position of the CISO, I introduced the new research I've done on extending the classic "Confidentiality-Integrity-Availability" (C-I-A) frame for security analysis, to cover all the other qualities of enterprise information assets. The idea is to build a comprehensive account of what it is that makes information valuable in the context of the business, leveraging the traditional tools and skills of the CISO. After all, security professionals are particularly good at looking at context. Instead of restricting themselves to defending information assets against harm, CISOs can be helping to enhance those assets by building up their other competitive attributes.
Let's look at some examples of how this would work, in some classic Big Data applications in retail and hospitality.
Companies in these industries have long been amassing detailed customer databases under the auspices of loyalty programs. Supermarkets have logged our every purchase for many years, so they can for instance put together new deals on our favorite items, from our preferred brands, or from competitors trying to get us to switch brands. Likewise, hotels track when we stay and what we do, so they can personalise our experience, tailor new packages for us, and try to cross-sell services they predict we'll find attractive. Behind the scenes, the data is also used for operations to plan demand, fine tune their logistics and so on.
Big Data techniques amplify the value of information assets enormously, but they can take us into complicated territory. Consider for example the potential for loyalty information to be parlayed into insurance and other financial services products. Supermarkets find they now have access to a range of significant indicators of health & lifestyle risk factors which are hugely valuable in insurance calculations ... if only the data is permitted to be re-purposed like that.
The question is, what is it about the customer database of a given store or hotel that gives it an edge over its competitors? There many more attributes to think creatively about beyond C-I-A!
- It's important to rigorously check that the raw data, the metadata and any derived analytics can actually be put to different business purposes.
- Are data formats well-specified, and technically and semantically interoperable?
- What would it cost to improve interoperability as needed?
- Is the data physically available to your other business systems?
- Does the rest of the business know what's in the data sets?
- Do you know more about your customers than your competitors do?
- Do you supplement and enrich raw customer behaviours with questionaires, or linked data?
- How far back in time do the records go?
- Do you understand the reason any major gaps? Do the gaps themselves tell you anything?
- What sort of metadata do you have? For example, do you retain time & location, trend data, changes, origins and so on?
- Currency & Accuracy
- Is your data up to date? Remember that accuracy can diminish over time, so the sheer age of a long term database can have a downside.
- What mechanisms are there to keep data up to date?
- Permissions & Consent
- Have customers consented to secondary usage of data?
- Is the consent specific, blanket or bundled?
- Might customers be surprised and possibly put off to learn how their loyalty data is utilised?
- Do the terms & conditions of participation in a loyalty program cover what you wish to do with the data?
- Do the Ts&Cs (which might have been agreed to in the past) still align with the latest plans for data usage?
- Are there opportunities to refresh the Ts&Cs?
- Are there opportunities for customers to negotiate the value you can offer for re-purposing the data?
- When businesses derive new insights from data, it is possible that they are synthesising brand new Personal Information, and non-obvious privacy obligations can go along with that. The competitive advantage of Big Data can be squandered if regulations are overlooked, especially in international environments.
- So where is the data held, and where does it flow?
- Are applications for your data compliant with applicable regulations?
- Is health information or similar sensitive Personal Information extracted or synthesised, and do you have specific consent for that?
- Can you meet the Access & Correction obligations in many data protection regulations?
For more detail, my new report, "Strategic Opportunities for the CISO", is available now.
Facial recognition is digital alchemy. It's the prince of data mining.
Facial recognition takes previously anonymous images and conjures peoples' identities. It's an invaluable capability. Once they can pick out faces in crowds, trawling surreptitiously through anyone and everyone's photos, the social network businesses can work out what we're doing, when and where we're doing it, and who we're doing it with. The companies figure out what we like to do without us having to 'like' or favorite anything.
So Google, Facebook, Apple at al have invested hundreds of megabucks in face recognition R&D and buying technology start-ups. And they spend billions of dollars buying images and especially faces, going back to Google's acquisition of Picasa in 2004, and most recently, Facebook's ill-fated $3 billion offer for Snapchat.
But if most people find face recognition rather too creepy, then there is cause for optimism. The technocrats have gone too far. What many of them still don't get is this: If you take anonymous data (in the form of photos) and attach names to that data (which is what Facebook photo tagging does - it guesses who people are in photos are, attaches putative names to records, and invites users to confirm them) then you Collect Personal Information. Around the world, existing pre-biometrics era black letter Privacy Law says you can't Collect PII even indirectly like that without am express reason and without consent.
When automatic facial recognition converts anonymous data into PII, it crosses a bright line in the law.
Exploring new strategic opportunities for CIOs and CISOs.
For as long as we've had a distinct information security profession, it has been said that security needs to be a "business enabler". But what exactly does that mean? How can security professionals advance from their inherently defensive postures, into more strategic positions, and contribute actively to the growth of the business? This is the focus of my latest work at Constellation Research. It turns out that security professionals have special tools and skills ideally suited to a broader strategic role in information management.
The role of Chief Information Security Officer (CISO) is a tough one. Security is red hot. Not a week goes by without news of another security breach.
Information now is the lifeblood of most organisations; CISOs and their teams are obviously crucial in safeguarding that. But a purely defensive mission seldom allows for much creativity, or a positive reputation amongst one's peers. A predominantly reactive work mode -- as important as it is from day to day -- can sometimes seem precarious. The good news for CISOs' career security and job satisfaction is they happen to have special latent skills to innovate and build out those most important digital assets.
Information assets are almost endless: accounts, ledgers and other legal records, sales performance, stock lists, business plans, R&D plans, product designs, market analyses and forecasts, customer data, employee files, audit reports, patent specifications and trade secrets. But what is it about all this information that actually needs protecting? What exactly makes any data valuable? These questions take us into the mind of the CISO.
Security management is formally all about the right balance of Confidentiality, Integrity and Availability in the context of the business. Different businesses have different needs in these three dimensions.
Think of the famous industrial secrets like the recipes for KFC or Coca Cola. These demand the utmost confidentiality and integrity but the availability of the information can be low (nay, must be low) because it is accessed as a whole so seldomly. Medical records too have traditionally needed confidentiality more than availability, but that's changing. Complex modern healthcare demands electronic records, and these do need high availability especially in emergency care settings.
In contrast, for public information like stock prices there is no value in confidentiality whatsoever, and instead, availability and integrity are paramount. On the other hand, market-sensitive information that listed companies periodically report to stock exchanges must have very strict confidentiality for a relatively brief period.
Security professionals routinely compile Information Asset Inventories and plan for appropriate C-I-A for each type of data held. From there, a Threat & Risk Assessment (TRA) is generally undertaken, to examine the adverse events that might compromise the Confidentiality, Integrity and/or Availability. The likelihood and the impact of each adverse event are estimated and multiplied together to gauge the practical risk posed by each known threat. By prioritising counter-measures for the identified threats, in line with the organisation's risk appetite, the TRA helps guide a rational program of investment in security.
Now their practical experience can put CISOs in a special position to enhance their organisation's information assets rather than restrict themselves to hardening information against just the negative impacts.
Here's where the CISO's mindset comes into play in a new way. The real value of information lies not so much in the data itself as in its qualities. Remember the cynical old saw "It's not what you know, it's who you know". There's a serious side to the saying, which highlights that really useful information has pedigree.
So the real action is in the metadata; that is, data about data. It may have got a bad rap recently thanks to surveillance scandals, but various thinkers have long promoted the importance of metadata. For example, back in the 1980s, Citibank CEO Walter Wriston famously said "information about money will become almost as important as money itself". What a visionary endorsement of metadata!
The important latent skills I want to draw out for CISOs is their practiced ability to deal with the qualities of data. To bring greater value to the business, CISOs can start thinking about the broader pedigree of data and not merely its security qualities. They should spread their wings beyond C-I-A, to evaluate all sorts of extra dimensions, like completeness, reliability, originality, currency, privacy and regulatory compliance.
The core strategic questions for the modern CISO are these: What is it about your corporate information that gives you competitive advantage? What exactly makes information valuable?
The CISO has the mindset and the analytical tools to surface these questions and positively engage their executive peers in finding the answers.
My new Constellation Research report will be published soon.
In response to "The Solace of Oblivion", Jeffrey Toobin, The New Yorker, September 29th, 2014.
The "Right to be Forgotten" is an unfortunate misnomer for a balanced data control measure decided by the European Court of Justice. The new rule doesn't seek to erase the past but rather to restore some of its natural distance. Privacy is not about secrecy but moderation. Yet restraint is toxic to today's information magnates, and the response of some to even the slightest throttling of their control of data has been shrill. Google doth protest too much when it complains that having to adjust its already very elaborate search filters makes it an unwilling censor.
The result of a multi-billion dollar R&D program, Google's search engine is a great deal more than a latter-day microfiche. Its artificial intelligence tries to predict what users are really looking for, and as a result, its insights are all the more valuable to Google's real clients -- the advertisers. No search result is a passive reproduction of data from a "public domain". Google makes the public domain public. So if search reveals Personal Information that was hitherto impossible to find, then Google should at least participate in helping to temper the unintended privacy consequences.
October 1, 2014.
I was discussing definitions of Personally Identifiable Information (PII) with some lawyers today, one of whom took exception to the US General Services Administration definition: information that can be used to distinguish or trace an individual’s identity, either alone or when combined with other personal or identifying information that is linked or linkable to a specific individual". This lawyer concluded rather hysterically that under such a definition, "nobody can use the internet without a violation".
Similarly, I've seen engineers in Australia recoil at the possibility that IP and MAC Addresses might be treated as PII because it is increasingly easy to link them to the names of device owners. I was recently asked "Why are they stopping me collecting IP addresses?". The answer is, they're not.
There are a great many misconceptions about privacy, but the idea that 'if it's personal you can't use it' is by far the worst.
Nothing in any broad-based data privacy law I know of says personal information cannot be collected or used.
Rather, what data privacy laws actually say is: if you're collecting and using PII, be careful.
Privacy is about restraint. The general privacy laws of Australia, Europe and 100-odd countries say things like don't collect PII without consent, don't collect PII beyond what you demonstrably need, don't use PII collected for one purpose for other unrelated purposes, tell individuals if you can what PII you hold about them, give people access to the PII you have, and do not retain PII for longer than necessary.
Such rules are entirely reasonable, and impose marginal restrictions on the legitimate conduct of business. And they align very nicely with standard security practice which promotes the Need To Know principle and the Principle of Least Privilege.
Compliance with Privacy Principles does add some overhead to data management compared with anonymous data. If re-identification techniques and ubiquitous inter-connectedness means that hardly any data is going to stay anonymous anymore, then yes, privacy laws mean that data should be treated more cautiously than was previously the case. And what exactly is wrong with that?
If data is the new gold then it's time data custodians took more care.
A repeated refrain of cynics and “infomopolists” alike is that privacy is dead. People are supposed to know that anything on the Internet is up for grabs. In some circles this thinking turns into digital apartheid; some say if you’re so precious about your privacy, just stay offline.
But socialising and privacy are hardly mutually exclusive; we don’t walk around in public with our names tattooed on our foreheads. Why can’t we participate in online social networks in a measured, controlled way without submitting to the operators’ rampant X-ray vision? There is nothing inevitable about trading off privacy for conviviality.
The privacy dangers in Facebook and the like run much deeper than the self-harm done by some peoples’ overly enthusiastic sharing. Promiscuity is actually not the worst problem, neither is the notorious difficulty of navigating complex and ever changing privacy settings.
The advent of facial recognition presents far more serious and subtle privacy challenges.
Facebook has invested heavily in face recognition technology, and not just for fun. Facebook uses it in effect to crowd-source the identification and surveillance of its members. With facial recognition, Facebook is building up detailed pictures of what people do, when, where and with whom.
You can be tagged without consent in a photo taken and uploaded by a total stranger.
The majority of photos uploaded to personal albums over the years were not intended for anything other than private viewing.
Under the privacy law of Australia and data protection regulations in dozens of other jurisdictions, what matters is whether data is personally identifiable. The Commonwealth Privacy Act 1988 (as amended in 2014) defines “Personal Information” as: “information or an opinion about an identified individual, or an individual who is reasonably identifiable”.
Whenever Facebook attaches a member’s name to a photo, they are converting hitherto anonymous data into Personal Information, and in so doing, they become subject to privacy law. Automated facial recognition represents an indirect collection of Personal Information. However too many people still underestimate the privacy implications; some technologists naively claim that faces are “public” and that people can have no expectation of privacy in their facial images, ignoring that information privacy as explained is about the identifiability and identification of data; the words “public” and “private” don’t even figure in the Privacy Act!
If a government was stealing into our photo albums, labeling people and profiling them, there would be riots. It's high time that private sector surveillance - for profit - is seen for what it is, and stopped.
Ed Snowden was interviewed today as part of the New Yorker festival. This TechCruch report says Snowden "was asked a couple of variants on the question of what we can do to protect our privacy. His first answer called for a reform of government policies." He went on to add some remarks about Google, Facebook and encryption and that's what the report chose to focus on. The TechCrunch headline: "Snowden's Privacy Tips".
Mainstream and even technology media reportage does Snowden a terrible disservice and takes the pressure off from government policy.
I've listened to the New Yorker online interview. After being asked by a listener what they should do about privacy, Snowden gave a careful, nuanced, and comprehensive answer over five minutes. His very first line was this is an incredibly complex topic and he did well to stick to plain language throughout. He canvassed a great many issues including: the need for policy reform, the 'Nothing to Hide' argument, the inversion of civil rights when governments ask us to justify the right to be left alone, the collusion of companies and governments, the poor state of product security and usability, the chilling effect on industry of government intervention in security, metadata, and the radicalisation of computer scientists today being comparable with physicists in the Cold War.
Only after all that, and a follow up question about 'ordinary people', did Snowden say 'don't use Dropbox'.
Consistently, when Snowden is asked what to do about privacy, his answers are primarily about politics not technology. When pressed, he dispenses the odd advice about using Tor and disk encryption, but Snowden's chief concerns (as I have discussed in depth previously) are around accountability, government transparency, better cryptology research, better security product quality, and so on. He is no hacker.
I am simply dismayed how Snowden's sophisticated analyses are dumbed down to security tips. He has never been a "cyber Agony Aunt". The proper response to NSA overreach has to be agitation for regime change, not do-it-yourself cryptography. That is Snowden's message.
Tonight, Australian Broadcasting Corporation’s Four Corners program aired a terrific special, "Privacy Lost" written and produced by Martin Smith from the US public broadcaster PBS’s Frontline program.
Here we have a compelling demonstration of the importance and primacy of Collection Limitation for protecting our privacy.
UPDATE: The program we saw in Australia turns out to be a condensed version of PBS's two part The United States of Secrets from May 2014.
About the program
Martin Smith summarises brilliantly what we know about the NSA’s secret surveillance programs, thanks to the revelations of Ed Snowden, the Guardian’s Glenn Greenwald and the Washington Post’s Barton Gellman; he holds many additional interviews with Julia Angwin (author of “Dragnet Nation”), Chris Hoofnagle (UC Berkeley), Steven Levy (Wired), Christopher Soghoian (ACLU) and Tim Wu (“The Master Switch”), to name a few. Even if you’re thoroughly familiar with the Snowden story, I highly recommend “Privacy Lost” or the original "United States of Secrets" (which unlike the Four Corners edition can be streamed online).
The program is a ripping re-telling of Snowden’s expose, against the backdrop of George W. Bush’s PATRIOT Act and the mounting suspicions through the noughties of NSA over-reach. There are freshly told accounts of the intrigues, of secret optic fibre splitters installed very early on in AT&T’s facilities, scandals over National Security Letters, and the very rare case of the web hosting company Calyx who challenged their constitutionality (and yet today, with the letter withdrawn, remains unable to tell us what the FBI was seeking). The real theme of Smith’s take on surveillance then emerges, when he looks at the rise of data-driven businesses -- first with search, then advertising, and most recently social networking -- and the “data wars” between Google, Facebook and Microsoft.
In my view, the interplay between government surveillance and digital businesses is the most important part of the Snowden epic, and it receives the proper emphasis here. The depth and breadth of surveillance conducted by the private sector, and the insights revealed about what people might be up to creates irresistible opportunities for the intelligence agencies. Hoofnagle tells us how the FBI loves Facebook. And we see the discovery of how the NSA exploits the tracking that’s done by the ad companies, most notably Google’s “PREF” cookie.
One of the peak moments in “Privacy Lost” comes when Gellman and his specialist colleague Ashkan Soltani present their evidence about the PREF cookie to Google – offering an opportunity for the company to comment before the story is to break in the Washington Post. The article ran on December 13, 2013; we're told it was then the true depth of the privacy problem was revealed.
My point of view
Smith takes as a given that excessive intrusion into private affairs is wrong, without getting into the technical aspects of privacy (such as frameworks for data protection, and various Privacy Principles). Neither does he unpack the actual privacy harms. And that’s fine -- a TV program is not the right place to canvass such technical arguments.
When Gellman and Soltani reveal that the NSA is using Google’s tracking cookie, the government gets joined irrefutably to the private sector in a mass surveillance apparatus. And yet I am not sure the harm is dramatically worse when the government knows what Facebook and Google already know.
Privacy harms are tricky to work out. Yet obviously no harm can come from abusing Personal Information if that information is not collected in the first place! I take away from “Privacy Lost” a clear impression of the risks created by the data wars. We are imperiled by the voracious appetite of digital businesses that hang on indefinitely to masses of data about us, while they figure out ever cleverer ways to make money out of it. This is why Collection Limitation is the first and foremost privacy protection. If a business or government doesn't have a sound and transparent reason for having Personal Information about us, then they should not have it. It’s as simple as that.
Martin Smith has highlighted the symbiosis between government and private sector surveillance. The data wars not only made dozens of billionaires but they did much of the heavy lifting for the NSA. And this situation is about to get radically more fraught. On the brink of the Internet of Things, we need to question if we want to keep drowning in data.
The "Right to be Forgotten" debate reminds me once again of the cultural differences between technology and privacy.
On September 30, I was honoured to be part of a panel discussion hosted by the IEEE on RTBF; a recording can be viewed here. In a nutshell, the European Court of Justice has decided that European citizens have the right to ask search engine businesses to suppress links to personal information, under certain circumstances. I've analysed and defended the aims of the ECJ in another blog.
One of the IEEE talking points was why RTBF has attracted so much scorn. My answer was that some critics appear to expect perfection in the law; when they look at the RTBF decision, all they see is problems. Yet nobody thinks this or any law is perfect; the question is whether it helps improve the balance of rights in a complex and fast changing world.
It's a little odd that technologists in particular are so critical of imperfections in the law, when they know how flawed is technology. Indeed, the security profession is almost entirely concerned with patching problems, and reminding us there will never be perfect security.
Of course there will be unwanted side-effects of the new RTBF rules and we should trust that over time these will be reviewed and dealt with. I wish that privacy critics could be more humble about this unfolding environment. I note that when social conservatives complain about online pornography, or when police decry encryption as a tool of criminals, technologists typically play those problems down as the unintended consequences of new technologies, which on average overwhelmingly do good not evil.
And it's the same with the law. It really shouldn't be necessary to remind anyone that laws have unintended consequences, for they are the stuff of the entire genre of courtroom drama. So everyone take heart: the good guys nearly always win in the end.