Surfacing identity

Update 19 May 2014: I changed “assertions” to “attributes” in the body of the blog, to use the more popular term right now. See how Bob Pinheiro in his comment rightly used the terms attributes/assertions interchangeably. I’m sure myself attributes, assertions and claims are synonymous for the purposes of “identity management”.

Update 11 Feb 2023: I was using “attribute”, “assertion”, “claim” and so on synonymously in this and other works (more recently I have pivoted to what might be the simplest word: “fact”). I want to acknowledge a very useful explanation from the late great Kim Cameron about why he used “claim” in the Laws of Identity and how that term differs from the rest. Kim used “claim” to mean an assertion which is in doubt. That is, the assertion on its own is not necessarily true, and needs corroborating. That resonates with me because it goes to the importance of metadata. If a fact is in doubt, then relying parties will look for additional facts about the facts, such as origin, age, permissions, Ts&Cs for use, the algorithms and other conditions that created the fact, and so on. This is all metadata. Verifiable data will be presented as combinations of data and metadata.

Original post

The metaphor of a spectrum is often used to describe a sliding scale of knowingness. The degree to which someone is known is shown to range from zero (anonymity), up to some maximum (i.e. “verified identity”) passing through pseudonymity and self-asserted identity along the way. It’s a useful way of characterising some desirable features of identity management; it’s definitely good to show that in different settings, we need to know different things about people. But the spectrum is something of an oversimplification, and it contradicts modern risk management. While it’s great to legitimise the plurality of identities (by illustrating how we can maintain several identities at different points on a spectrum), the metaphor is problematic. Spectra are linear, with just one independent variable whereas risk management is multi-dimensional. The metaphor implies that identities can be ordered from weak to strong — they can’t — and insidiously suggests that identities at the right hand end of the scale are superior.

A Digital Identity is a set of claims (aka attributes) that are meaningful in some context [Ref: Kim Cameron’s Laws of Identity]. When an Identity Provider (IdP) identifies me in their context, what they’re doing is testing and vouching for a closed set of n attributes: {A₁, A₂, …, A_n}. When a Relying Party (RP) wants to identify me, they need to be satisfied about a number of particular attributes relevant to their business; let’s say there are m of them: {A_i, A_ii, …, A_m}. These sets are disjoint; the things about me that matter to an RP may or may not be the same things than the IdP is able to assert about me.

Meaningful Identity Federation requires, at the very least, that (1) the RP’s m attributes are a subset of the IdP’s n attributes, and (2) the IdP has tested each attribute to an acceptable level of confidence for the RP’s purposes. When designing a federation, the sets of attributes for all anticipated RPs need to be defined in advance, together with the required confidence levels. Closing the “attribute space” and quantifying all its dimensions is a huge challenge.

When we look at identification risk management in a more multi-dimensional way, each identity looks more like a surface in a multidimensional space than a simple point on a 1D line. For example, let’s imagine that a general purpose IdP ascertains and vouches for six attributes: given name, home address, date of birth, educational qualifications, residency and gender. The IdP gauges the accuracy with which it can make each attribute as follows:

A1 Given name 90%
A2 Address 90%
A3 DOB 90%
A4 Gender 35%
A5 Qualifications 25%
A6 Residency 25%

For this Identity Provider to be useful to any given Relying Party, the attributes need to be of interest to the RP, and they have to be asserted with a minimum accuracy. Consider RP1, a bank, which needs to be sure of a customer’s name, address and date of birth to at least 80% confidence under applicable KYC rules, and doesn’t need to know anything else. We can plot RP1’s identity expectation and compare it with the IdP’s attributes. All well and good in this case, for the IdP covers the RP:

Now consider RP2, an adult social networking service. All it wants to know is that its anonymous customers are at least 18 years of age. Its requirement for Attribute 3 is 90%, and it doesn’t care about anything else. So again, the IdP meets the needs of this RP (assuming that the identity management technology allows for selected disclosure of just the relevant attribute and hides all the others):

Finally, let’s look at a hospital employing a casual doctor. Credentialing rules and malpractice risk means that the hospital is more interested in the individual’s qualifications and residency (which must be known with 90% confidence), than their name and address (50%). And now we see that RP3’s requirements are not covered by this particular IdP:

Returning to the idea of a spectrum, there is no sliding scale from anonymity up to “full” identity. Neither can trust in an identity be pinpointed somewhere between LOA 1 and LOA 4. In general, the more serious an identity gets, the more complex and multivariate is the set of attributes that it covers. I’m afraid the pseudonymous social logon experience at LOA 1 doesn’t pave the way to more serious multifaceted identity federation “at the other end” of a spectrum. It’s not like simply turning up the heat to step up from cold to hot.