In my recent post "Identity is in the eye of the beholder" I tried to unpack the language of "identity provision". I argued that IdPs do not and cannot "provide identity" because identification is carried out by Relying Parties.
It may seem like a sterile view in these days of user-centric 'self narrated' and 'bring-you-own identities' but I think the truth is that identity (for the purposes of approving transactions) is actually determined by Relying Parties. The state of being "identified" may be assisted (to a very great extent) by information provided by others including so-called "Identity" Providers but ultimately it is the RP that identifies me.
I note that the long standing dramaturgical analysis of social identity of Erving Goffman actually says the same thing, albeit in a softer way. That school of thought holds that identity is an emergent property, formed by the way we think others see us. In a social setting there are in effect many Relying Parties, all impressing upon us their sense of who we are. We reach an equilibrium over time, after negotiating all the different interrelating roles in the play of life. And the equilibrium can be starkly disrupted in what I've called the "High School Reunion Effect". So we do not actually curate our own identities with complete self-determination, but rather we allow our identities to be moulded dynamically to fit the expectations of those around us.
Now, in the digital realm, things are so much simpler, you might even say more elegant in an engineering fashion. I'd like to think that the dramaturgical frame sets a precedent for thinking in terms of having identities impressed upon us. We should not take umbrage at this, and we should temper what we mean by "user centric" identities: it need not mean freely expressing all of our identities for ourselves, but allowing for the fact that identity is shaped by what others need to know about us. In a great deal of business, identities are completely defined (imposed) by what the RP needs to know.
For more precision, maybe it would be useful to get into the habit of specifying the context whenever we talk of a Digital Identity. So here's a bit of mathematical nomenclature, but don't worry, it's not strenuous!
Let's designate the identification performed by a Relying Party RP on a Subject S as IRP-S.
If the RP has drawn on information provided by one "Identity Provider" (running with the dominant language for now), then we can write the identification as a function of the IdP:
Identification = IRP-S(IdP)
But it is still true that the end-point of identification is reached by the RP and not the IdP.
We can generalise from this to imagine Relying Parties drawing on more than one IdP in reaching the point where the subject is identified, to the satisfaction of the RP:
Identification = IRP-S(IdP1, IdP2)
And then we could take things one step further, to recognise that the distinction between "identity providers" and "attribute providers" is arbitrary. Fundamentally identities and attributes are just pieces of information that factor into an RP's decision to accept or reject a Subject. So the most general formulation would show identification being a function of a number of attributes verified by the RP either for itself or on its behalf by external attribute providers:
Identification = IRP-S(A1, A2,..., A2)
(where the source of the attribute information could be indicated in various ways).
The work we're trying to start in Australia on a Claims Verification ecosystem reflects this kind of thinking -- it may be more powerful and more practicable to have RPs assemble their knowledge of Subjects from a variety of sources.
"Here's your cool new identifier! It's so easy to use. No passwords to forget, no cards to leave behind. And it's so high tech, you're going to be able to use it for everything eventually: payments, banking, e-health, the Interwebs, unlocking your house or office, starting your car!
"Oh, one thing though, there are some little clues about your identifier around the place. Some clues are on the ATM, some clues are in Facebook and others in Siri. There may be a few in your trash. But it's nothing to worry about. It's hard for hackers to decipher the clues. Really quite hard.
"What's that you say? What if some hacker does figure out the puzzle? Gosh, um, we're not exactly sure, but we got some guys doing their PhDs on that issue. Sorry? Will we give you a new identifier in the meantime? Well, no actually, we can't do that right now. Ok, no other questions? Cool!
Posted in Biometrics
No it doesn't, it only means the end of anonymity.
Anonymity is not the same thing as privacy. Anonymity keeps people from knowing what you're doing, and it's a vitally important quality in many settings. But in general we usually want people (at least some people) to know what we're up to, so long as they respect that knowledge. That's what privacy is all about. Anonymity is a terribly blunt instrument for protecting privacy, and it's also fragile. If anonymity was all you have, then you're in deep trouble when someone manages to defeat it.
New information technologies have clearly made anonymity more difficult, yet it does not follow that we must lose our privacy. Instead, these developments bring into stark relief the need for stronger regulatory controls that compel restraint in the way third parties deal with Personal Information that comes into their possession.
A great example is Facebook's use of facial recognition. When Facebook members innocently tag one another in photos, Facebook creates biometric templates with which it then automatically processes all photo data (previously anonymous), looking for matches. This is how they can create tag suggestions, but Facebook is notoriously silent on what other applications it has for facial recognition. Now and then we get a hint, with, for example, news of the Facedeals start up last year. Facedeals accesses Facebook's templates (under conditions that remain unclear) and uses them to spot customers as they enter a store to automatically check them in. It's classic social technology: kinda sexy, kinda creepy, but clearly in breach of Collection, Use and Disclosure privacy principles.
And indeed, European regulators have found that Facebook's facial recognition program is unlawful. The chief problem is that Facebook never properly disclosed to members what goes on when they tag one another, and they never sought consent to create biometric templates with which to subsequently identify people throughout their vast image stockpiles. Facebook has been forced to shut down their facial recognition operations in Europe, and they've destroyed their historical biometric data.
So privacy regulators in many parts of the world have real teeth. They have proven that re-identification of anonymous data by facial recognition is unlawful, and they have managed to stop a very big and powerful company from doing it.
This is how we should look at the implications of the DNA 'hacking'. Indeed, Melissa Gymrek from the Whitehead Institute said in an interview: "I think we really need to learn to deal with the fact that we cannot ever make data sets truly anonymous, and that I think the key will be in regulating how we are allowed to use this genetic data to prevent it from being used maliciously."
Perhaps this episode will bring even more attention to the problem in the USA, and further embolden regulators to enact broader privacy protections there. Perhaps the very extremeness of the DNA hacking does not spell the end of privacy so much as its beginning.
Biometrics seems to be going gang busters in the developing world. I fear we're seeing a new wave of technological imperialism. In this post I will examine whether the biometrics field is mature enough for the lofty social goal of empowering the world's poor and disadvantaged with "identity".
The independent Center for Global Development has released a report "Identification for Development: The Biometrics Revolution" which looks at 160 different identity programs using biometric technologies. By and large, it's a study of the vital social benefits to poor and disadvantaged peoples when they gain an official identity and are able to participate more fully in their countries and their markets.
The CGD report covers some of the kinks in how biometrics work in the real world, like the fact that a minority of people can be unable to enroll and they need to be subsequently treated carefully and fairly. But I feel the report takes biometric technology for granted. In contrast, independent experts have shown there is insufficient science for biometric performance to be predicted in the field. I conclude biometrics are not ready to support such major public policy initiatives as ID systems.
The state of the science of biometrics
I recently came across a weighty assessment of the science of biometrics presented by one of the gurus, Jim Wayman, and his colleagues to the NIST IBPC 2010 biometric testing conference. The paper entitled "Fundamental issues in biometric performance testing: A modern statistical and philosophical framework for uncertainty assessment" should be required reading for all biometrics planners and pundits.
Here are some important extracts:
[Technology] testing on artificial or simulated databases tells us only about the performance of a software package on that data. There is nothing in a technology test that can validate the simulated data as a proxy for the “real world”, beyond a comparison to the real world data actually available. In other words, technology testing on simulated data cannot logically serve as a proxy for software performance over large, unseen, operational datasets. [p15, emphasis added].
In a scenario test, [False Non Match Rate and False Match Rate] are given as rates averaged over total transactions. The transactions often involve multiple data samples taken of multiple persons at multiple times. So influence quantities extend to sampling conditions, persons sampled and time of sampling. These quantities are not repeatable across tests in the same lab or across labs, so measurands will be neither repeatable nor reproducible. We lack metrics for assessing the expected variability of these quantities between tests and models for converting that variability to uncertainty in measurands.[p17].
To explain, a biometric "technology test" is when a software package is exercised on a standardised data set, usually in a bake-off such as NIST's own biometric performance tests over the years. And a "scenario test" is when the biometric system is tested in the lab using actual test subjects. The meaning of the two dense sentences underlined by me in the extracts is: technology test results from one data set do not predict performance on any other data set or scenario, and biometrics practitioners still have no way to predict the accuracy of their solutions in the real world.
The authors go on:
[To] report false match and false non-match performance metrics for [iris and face recognition] without reporting on the percentage of data subjects wearing contact lenses, the period of time between collection of the compared image sets, the commercial systems used in the collection process, pupil dilation, and lighting direction is to report "nothing at all". [pp17-18].
And they conclude, amongst other things:
[False positive and false negative] measurements have historically proved to be neither reproducible nor repeatable except in very limited cases of repeated execution of the same software package against a static database on the same equipment. Accordingly, "technology" test metrics have not aligned well with "scenario" test metrics, which have in turn failed to adequately predict field performance. [p22].
The limitations of biometric testing has repeatedly been stressed by no less an authority than the US FBI. In their State-of-the-Art Biometric Excellence Roadmap (SABER) Report the FBI cautions that:
For all biometric technologies, error rates are highly dependent upon the population and application environment. The technologies do not have known error rates outside of a controlled test environment. Therefore, any reference to error rates applies only to the test in question and should not be used to predict performance in a different application. [p4.10]
The SABER report also highlighted a widespread weakness in biometric testing, namely that accuracy measurements usually only look at accidental errors:
The intentional spoofing or manipulation of biometrics invalidates the “zero effort imposter” assumption commonly used in performance evaluations. When a dedicated effort is applied toward fooling biometrics systems, the resulting performance can be dramatically different. [p1.4]
A few years ago, the Future of Identity in the Information Society Consortium ("FIDIS", a research network funded by the European Community’s Sixth Framework Program) wrote a major report on forensics and identity systems. FIDIS looked at the spoofability of many biometrics modalities in great detail (pp 28-69). These experts concluded:
Concluding, it is evident that the current state of the art of biometric devices leaves much to be desired. A major deficit in the security that the devices offer is the absence of effective liveness detection. At this time, the devices tested require human supervision to be sure that no fake biometric is used to pass the system. This, however, negates some of the benefits these technologies potentially offer, such as high-throughput automated access control and remote authentication. [p69]
Biometrics in public policy
To me, biometrics is in an appalling and astounding state of affairs. The prevailing public understanding of how these technologies work is utopian, based probably on nothing more than science fiction movies, and the myth of biometric uniqueness. In stark contrast, scientists warn there is no telling how biometrics will work in the field, and the FBI warns that bench testing doesn't predict resistance to attack. It's very much like the manufacturer of a safe confessing to a bank manager they don't know how it will stand up in an actual burglary.
This situation has bedeviled enterprise and financial services security for years. Without anyone admitting it, it's possible that the slow uptake of biometrics in retail and banking (save for Japan and their odd hand vein ATMs) is a result of hard headed security officers backing off when they look deep into the tech. But biometrics is going gang busters in the developing world, with vendors thrilling to this much bigger and faster moving market.
The stakes are so very high in national ID systems, especially in the developing world, where resistance to their introduction is relatively low, for various reasons. I'm afraid there is great potential for technological imperialism, given the historical opacity of this industry and its reluctance to engage with the issues.
To be sure vendors are not taking unfair advantage of the developing world ID market, they need to answer some questions:
- Firstly, how do they respond to Jim Wayman, the FIDIS Consortium and the FBI? Is it possible to predict how finger print readers, face recognition and iris scanners are going to operate, over years and years, in remote and rural areas?
- In particular, how good is liveness detection? Can these solutions be trusted in unattended operation for such critical missions as e-voting?
- What contingency plans are in place for biometric ID theft? Can the biometric be cancelled and reissued if compromised? Wouldn't it be catastrophic for the newly empowered identity holder to find themselves cut out of the system if their biometric can no longer be trusted?
I had a letter published in Science magazine about the recently publicised re-identification of anonymously donated DNA data. It has been shown that there is enough named genetic information online, in genealogical databases for instance, that anonymous DNA posted in research databases can be re-identified. This is a sobering result, but does it mean that 'privacy is dead'? No.
The fact is that re-identification of erstwhile anonymous data represents a new act of collection of PII and as such is subject to the Collection Limitation Principle in privacy law around the world. This type of data processing is essentially the same as Facebook using biometric facial recognition to identify people in anonymous photos. European regulators recently found Facebook to have breached privacy law with its Tag Suggestions function and have forced Facebook to shut down their facial recognition feature.
I expect that the same legal principles to apply to the re-identification of DNA. That is, there are legal constraints on what can be done with 'anonymous' data no matter where you get it from: under most data privacy laws, attaching names to previously anonymous data constitutes a Collection of PII, and as such, is subject to consent rules and all sorts of other principles. As a result, bioinformatics researchers will have to tread carefully, justifying their ends and their means before ethics committees. And corporations who seek to exploit the ability to put names on anonymous genetic data may face the force of the law as Facebook did.
The text of my letter to Science follows, and after that, I'll keep posting follow ups.
Science 8 February 2013:
Vol. 339 no. 6120 pp. 647
Yaniv Erlich at the Whitehead Institute for Biomedical Research used his hacking skills to decipher the names of anonymous DNA donors ("Genealogy databases enable naming of anonymous DNA donor," J. Bohannon, 18 January, p. 262). A little-known legal technicality in international data privacy laws could curb the privacy threats of reverse identification from genomes. "Personal information" is usually defined as any data relating to an individual whose identity is readily apparent from the data. The OECD Privacy Principles are enacted in over 80 countries worldwide . Privacy Principle No. 1 states: "There should be limits to the collection of personal data and any such data should be obtained by lawful and fair means and, where appropriate, with the knowledge or consent of the data subject." The principle is neutral regarding the manner of collection. Personal information may be collected directly from an individual or indirectly from third parties, or it may be synthesized from other sources, as with "data mining."
Computer scientists and engineers often don't know that recording a person's name against erstwhile anonymous data is technically an act of collection. Even if the consent form signed at the time of the original collection includes a disclaimer that absolute anonymity cannot be guaranteed, re-identifying the information later signifies a new collection. The new collection of personal information requires its own consent; the original disclaimer does not apply when third parties take data and process it beyond the original purpose for collection. Educating those with this capability about the legal meaning of collection should restrain the misuse of DNA data, at least in those jurisdictions that strive to enforce the OECD principles.
It also implies that bioinformaticians working "with little more than the Internet" to attach names to samples may need ethics approval, just as they would if they were taking fresh samples from the people concerned.
Lockstep Consulting Pty Ltd
Five Dock Sydney, NSW 2046, Australia.
Let's assume Subject S donates their DNA, ostensibly anonymously, to a Researcher R1, under some consent arrangement which concedes there is a possibility that S will be re-identified. And indeed, some time later, an independent researcher R2 does identify S as belonging to the DNA sample. The fact that many commentators seem oblivious to is this: R2 has Collected Personal Information (or PII) about S.* If R2 has no relationship with S, then S has not consented to this new collection of her PII. In jurisdictions with strict Collection Limitation (like the EU, Australia and elsewhere) then it is a privacy breach for R2 to collect PII by way of DNA re-identification without express consent, regardless of whether R1 has conceded to S that it might happen. Even in the US, where the protections might not be so strict, there remains a question of ethics: should R2 conduct themselves in a manner that might be unlawful in other places?
* Footnote: The collection by R2 of PII about S in this case is indirect (or algorithmic), but nevertheless is a fresh collection. Logically, if anonymous data is converted into identifiable data, then PII has been collected. In most cases, the Collection Limitation principle in privacy is technology neutral. Privacy laws do not generally care how PII is collected; if PII is created by some process (such as re-identification) then privacy principles still apply. And this is what consumers would expect. If a Big Data process allows an organisation to work out insights about people without having to ask them explicit questions (which is precisely why Big Data is so valuable to business and governments), then the individuals concerned should expect that privacy principles still apply.
In an interview with Science Magazine on Jan 18, the Whitehead Institute's Melissa Gymrek discussed the re-identification methods, and the potential to protect against them. She concluded: "I think we really need to learn to deal with the fact that we cannot ever make data sets truly anonymous, and that I think the key will be in regulating how we are allowed to use this genetic data to prevent it from being used maliciously.".
I agree completely. We need regulations. Elsewhere I've argued that anonymity is an inadequate way to protect privacy, and that we need a balance of regulations and Privacy Enhancing Technologies. And it's for this reason that I am not fatalistic about the fact that anonymity can be broken, because we have the procedural means to see that privacy is still preserved.
Last week I had the very great pleasure of participating in the first MIT Legal Hackathon, organised by Dazza Greenwood and Thomas Hardjono for the MIT Media Lab, Kerberos Consortium and wwPass. I say first because they plan to hold a monthly hangout! I hope and expect that this will become a strong, dynamic new forum for multi-disciplined explorations of Digital Identity.
In Dazza's wrap-up of the event, he pondered the potential for "open public infrastructure for identity":
... like a big bus of some sort for essential claims from public or other sources, utilised foundationally for identity functions."
His idea builds out logically from a proposed system of claims verification services that I presented to the hackathon, and blogged about a few weeks ago. So for discussion, here's a further development of the schematic. A variety claims verification services would be made available over a common bus as Dazza suggested, and used by a Relying Party to assemble the particular fractions of information they decide will make up a Subject's identity in a given transaction context.
Something I really I like about this architecture is that it supports several different modes of identification. For one, it could be used in real time by an RP faced with a fresh user for the first time; the RP could in real time seek out 'attribute providers' in the OIX or Identity Metasystem way of working. Alternatively, for well-worn e-commerce transactions where the necessary claims are well known in advance, the Subject could put together a basket of claims in advance and carry them in an identity wallet to be presented directly to the RP.
The diagram also shows a visualisation of the claims of interest to the RP for the transaction at hand, and the necessary degree of confidence i each of them (i.e. 90% in name, residential address and date of birth). I discussed this way of looking at different claims sets as surfaces in another blog last year.
As we rethink identity orthodoxies in forums like the MIT Legal Hackathon, I propose we shift perspectives a little. For instance:
- We should drop down a level, and focus on ways to exchange information about elements of identity, rather than rolled-up "identities" themselves; that is, we should fractionate identity into its important component parts, guided by transaction context.
- When building identification services frameworks, we should avoid imposing particular business protocols on organisations, so they remain free to select which claims and combinations of claims they want Subjects to exhibit.
- We can avoid technicalities like the difference between "authentication" and "authorization", and indeed we can remove ourselves from the philosophical debates over "identity"; the proposal simply provides uniform market-based mechanisms for parties to assert and test elemental claims as a precursor to doing business.
- Life looks much simpler under the neutral definition of "authentication" adopted by the APEC eSecurity Task Group over a decade ago: the means by which a receiver of an electronic transaction or message makes a decision to accept or reject that transaction or message.
None of this is actually radical. We've always thought about claims and attributes, all the authentication protocols deal with attributes, the good old Laws of Identity were actually all about claims, and there is infrastructure to deal with claims.
I think we just need to shift focus. We technologists shouldn't be so preoccupied with identity per se; let businesses continue to sort out identities as they see fit, and just give them the means to deal digitally with component claims. Let's not put IdPs ahead of APs. It may turn out we don't need IdPs at all. It's all about the claims, and only about the claims.
That is to say, identity is in the eye of the Relying Party.
The word "identity" seems increasingly problematic to me. It's full of contradictions. On the one hand, it's a popular view that online identity should be "user centric"; many commentators call for users to be given greater determination in how they are identified. People like the idea of "narrating" their own identities, and "bringing their own identity" to work. Yet it's not obvious how governments, banks, healthcare providers or employers for instance can grant people much meaningful say in how they are identified, for it is the Relying Party that bears most risk in the event identification goes wrong. These sorts of organisations impress their particular forms of identity upon us in order to formalise the relationship they have with us and manage our access to services.
The language of orthodox Federated Identity institutionalises the idea that identity is a good that is "provided" to us through a supply formal chain elaborated in architectures like the Open Identity Exchange (OIX). It might make sense in some low risk settings for individuals to exercise a choice of IdPs, for example choosing between Facebook or Twitter to log on to a social website, but users still don't have much influence on how the IdPs operate, nor on the decision made by Relying Parties about which IdPs they elect to recognise. Think about the choice we have of credit cards: you might for instance prefer to use Diners Club over MasterCard, but if you're shopping at a place that doesn't accept Diners, your "choice" is constrained. You cannot negotiate in real time to have the store accept your chosen instrument (instead your choice is to get yourself a MasterCard, or go to a different store).
I think the concept of "identity" is so fluid that we should probably stop using it. Or at least use it with much more precision.
I'd like you to consider that "Identity Providers" do not in fact provide identity. They really can't provide identity at all, but only attributes that are put together by Relying Parties in order to decide if a user or customer is legitimate. The act of identification is a core part of risk management. Identification means getting to know a Subject so as to make certain risks more manageable. And it's always done by a Relying Party.
An issued Digital Identity is the outcome of an identification process in which claims about a Subject are verified, to the satisfaction of the Relying Party. An "identity" is basically a handle by which the Subject is known. Recall that the Laws of Identity usefully defined a Digital Identity as a set of claims about the Digital Subject. And we all know that identity is highly context dependent; on its own, an identity like "Acct No. 12345678" means little or nothing without knowing the context as well.
This line of reasoning reminds me once again of the technology neutral, functional definition of "authentication" used by the APEC eSecurity Task Group over a decade ago: the means by which a receiver of an electronic transaction or message makes a decision to accept or reject that transaction or message. Wouldn't life be so much simpler if we stopped overloading some bits of authentication knowledge with the label "identity" and going to such lengths to differentiate other bits of knowledge as "attributes"? What we need online is better means for reliably conveying precise pieces of information about each other, relevant to the transaction at hand. That's all.
Carefully unpacking the language of identity management, we see that no Identity Provider ever actually "identifies" people. In realty, identification is always done by Relying Parties by pulling together what they need to know about a Subject for their own purposes. One IdP might say "This is Steve Wilson", another "This is Stephen Kevin Wilson", another "This is @Steve_Lockstep", another "This is Stephen Wilson, CEO of Lockstep" and yet another "This is Stephen Wilson at 100 Park Ave Jonestown Visa 4000 1234 5678 9012". My "identity" is different at every RP, each to their need.
See also An Algebra of Identity.
Remember the practical experience of IdPs. Despite a decade of hard work, there are still no Relying Parties accepting general purpose identities at LOA 3 or LOA 4. For high risk transactions, RPs like banks and government agencies still prefer to issue their own identities. Seamless operation of IdPs -- where an RP can accept an identity from an external IdP without checking any other authentication signals -- only works at LOA 0, where Identity Providers like Twitter and Facebook don't know who you really are, and the Relying Party doesn't care who you are. And at intermediate levels of risk, we sometimes see the crazy situation where RPs seek to negotiate LOA "one and a half" because they're not satisfied by the assurances provided by IdPs at vanilla LOA 1 and LOA 2. Customising identities makes a mockery of federation and creates new contractual silos.
So much of this complexity would melt away if we dropped down a level, federated concrete attributes instead of abstract "identities", re-cast IdPs as Attribute Providers, stopped trying to pigeonhole risk and identity, and left all RPs free to identify their users in their own unique contexts.