Understanding biometrics and their necessary fallibility

Most lay people get their understanding of biometrics from watching science fiction movies, where people stare at a camera and money comes out. And unfortunately, some biometrics vendors even use sci-fi films in their sales presentations as if they’re case studies. In reality, biometrics just don’t work as portrayed.

Here we’ll spend just five or ten minutes looking a bit more deeply, to help set reaslistic expectations of this technology.

In practice, the most important thing about biometrics is their fallibility. Because of the vagaries of human traits and the way they vary from day to day, biometrics have to cope with the same person appearing a little different each time they front up. Inevitably this means that occasionally a biometric system will confuse one person with another. So what? Well, there are two major foibles of all biometrics that go unmentioned by most vendors:

1. There is an inherent trade off in all biometrics, between their ability to discriminate between different people (specificity) and their ability to properly recognise all users (sensitivity). You can’t have it both ways; a system that is very specific will be more inclined to reject a legitimate user, and conversely, a system that never fails to recognise you will also tend to occasionally confuse you with someone else. Yet biometrics vendors often quote their best case False Reject and False Accept figures side by side, as if they’re achievable simultaneously.

2. The only way to improve sensitivity and specificity at the same time is to tighten the enrolment and scanning conditions and/or the mathematical models that underpin the algorithms. In other words, to make the systems choosier. This is why really serious biometrics like face recognition for passports and driver licences require stringent lighting conditions and image quality, and why we should be wary of biometrics in mobile devices where there is almost no control over lighting and sound.

Uncertainty accumulates

The least technical criticism of biometrics concerns the fallibility of all measurement methods. Cameras, sensors and microphones – like human eyes and ears – are imperfect, and the ability of a biometric authentication system to distinguish between subtly different people is limited by the precision of the input devices.

Even if the underlying biological traits of interest are truly unique, it does not follow that our machinery will be able to measure them faithfully. Take the iris. This biometric is often promoted with the impressive claim that the probability of two individuals’ iris patterns matching is one in ten to the power of 78. These are literally astronomical odds; there are fewer atoms in the universe than 10-to-the-78. Yet does this figure necessarily tell us how accurate the end-to-end biometric system really is? Consider the fact that there are ten billion stars in the Milky Way. If two people look up in the night sky and each pick a star at random, is the probability of a match one in ten billion? Of course not, because of the limits of our measurement apparatus, in this case the naked eye. Interference too affects the precision of any measurement; the odds of two people in a big city picking the same star might be no better than one in a hundred.

The Sensitivity-Specificity tradeoff: False Positives and False Negatives

Biometric authentication entails a long chain of processing steps, all of which are imperfect. Each step introduces a small degree of uncertainty, as shown in the schematic below. Uncertainty is inescapable even before the first processing step, because the body part being measured can never appear exactly the same. The angle and pressure of a finger on a scanner, the distance of a face from a camera, the tone and volume of the voice, the background noise and lighting, the cleanliness of a lens all change from day to day. A biometric system cannot afford to be too sensitive to subtle variations, or else it can fail to recognise its target; a biometric must tolerate variation in the input, and inevitably this means the system can sometimes confuse its target for someone else.

Therefore all biometric systems inevitably commit two types of error:

1. A “False Negative” is when the system fails to recognise someone who is legitimately enrolled. False Negatives arise if the system cannot cope with subtle changes to the person’s features, the way they present themselves to the scanner, slight variations between scanners at different sites, and so on.

2. A “False Positive” is when the system confuses a stranger with someone else who is already enrolled. This may result from the system being rather too tolerant of variability from one day to another, or from site to site.

False Positives and False Negatives are inescapably linked. If we wish to make a given biometric system more specific – so that it is less likely to confuse strangers with enrolled users – then it will inevitably become less sensitive, tending to wrongly reject legitimate enrolled users more often.

The following schematics illustrate how a highly specific biometric system tends to commit more False Negatives, while a highly sensitive system exhibits relatively more False Positives.

A design decision has to be made when implementing biometrics as to which type of error is less problematic. Where stopping impersonation is paramount, such as in a data centre or missile silo, a biometric system would be biased towards false negatives. Where user convenience is rated highly and where the consequences of fraud are not irreversible, as with Automatic Teller Machines, a biometric might be biased more towards false positives. For border control applications, the sensitivity-specificity trade-off is a very difficult problem, with significant downsides associated with both types of error – either immigration security breaches, or long queues of restless passengers.

Any biometric system, in principle at least, can be tuned towards higher sensitivity or higher specificity, depending on the overall desired balance of security versus convenience. The performance at different thresholds is conventionally shown by a “Detection Error Tradeoff” (DET) curve.

Biometrics vendors tend to keep their DET curves confidential, and usually release commercial solutions where the ratio of False Accept Rate (FAR) to False Reject Rate (FRR) is fixed. The following DET curves are over ten years old but they remain some of the few examples that are publicly available, and they usefully compare several biometric technologies side by side.

Ref: “Biometric Product Testing Final Report” Issue 1.0, 2001 by the UK Government Communications Electronics Security Group (CESG).

Vendors occasionally specify the “Equal Error Rate” for their solutions. It’s important to understand what this spec is for. No real world biometric that I’m aware of is deployed with FAR and FRR tuned to be the same. Instead, the EER should be used as a benchmark for broadly comparing different technologies.

EER provides another useful ready reckoner. If a vendor specifies for example FAR = 0.0001% and FRR = 0.01% and yet you find that the EER is, say, 1% — that is, greater than both the quoted FAR and FRR — then you know that the vendor is quoting best case figures that cannot be realised simultaneously. Just look at the DET curves above. When False Accept Rate is 0.1% (ie false positives of 1 in a 1000) the False Reject Rate for ranges from at least 5% to as much as 30%. And we can see that an FAR of 0.0001% is really extreme; for most biometrics, such specificity leads to False Rejects of one in two or worse, rendering the solution unusable.

Failure To Enrol

Over and above the issues of False Positives and False Negatives is the unfortunate fact that not everyone will be able to enrol in a given biometric authentication system. At its extremes, this reality is obvious: individuals with missing fingers, or a severe speech impediment for example, may never be able to use certain biometrics.

However, failure to enrol has a deeper significance for more normal users. To minimise False Positives and False Negatives at the same time (as illustrated in the next figiure), a biometric method generally must tighten requirements on the quality of its input data. A fingerprint scanner for instance will perform better on high definition images, where more fingerprint features can be reliably extracted. If a fingerprint detector sets a relatively stringent cut-off for the quality of the image, then it may not be possible to enrol people who happen to have inherently faint fingerprints, such as the elderly, or those with particular skin conditions.

More subtle still is the effect of modelling assumptions within biometric algorithms. In order to make sense of biological traits, the algorithm has to have certain expectations built into it as to how the features of interest generally appear and how those features vary across the population; after all, it is the quantifiable variation in features which allows for different individuals to be told apart. Therefore, face and voice recognition algorithms in particular might be optimised for the statistical characteristics of certain racial groups or nationalities, making it difficult for people from other groups to be enrolled.

The impossibility of enrolling 100% of the population into any biometric security system has important implications for public policy. Clearly there can be at least the perception of discrimination against certain minority groups, if factors like age, foreign accent, ethnicity, disabilities, and/or medical conditions impede the effectiveness of a biometric system. And careful consideration must be given to what fall-back security provisions will be offered to those who cannot be enrolled. If there is a presumption that a biometric somehow provides superior security, then special measures may be necessary to provide equivalent security for the un-enrolled minority.