Lockstep

Mobile: +61 (0) 414 488 851
Email: swilson@lockstep.com.au

Utopia

"Here's your cool new identifier! It's so easy to use. No passwords to forget, no cards to leave behind. And it's so high tech, you're going to be able to use it for everything eventually: payments, banking, e-health, the Interwebs, unlocking your house or office, starting your car!

"Oh, one thing though, there are some little clues about your identifier around the place. Some clues are on the ATM, some clues are in Facebook and others in Siri. There may be a few in your trash. But it's nothing to worry about. It's hard for hackers to decipher the clues. Really quite hard.

"What's that you say? What if some hacker does figure out the puzzle? Gosh, um, we're not exactly sure, but we got some guys doing their PhDs on that issue. Sorry? Will we give you a new identifier in the meantime? Well, no actually, we can't do that right now. Ok, no other questions? Cool!

"Good luck!"

Posted in Biometrics

The beginning of privacy!

The headlines proclaim that the newfound ability to re-identify anonymous DNA donors means The End Of Privacy!.

No it doesn't, it only means the end of anonymity.

Anonymity is not the same thing as privacy. Anonymity keeps people from knowing what you're doing, and it's a vitally important quality in many settings. But in general we usually want people (at least some people) to know what we're up to, so long as they respect that knowledge. That's what privacy is all about. Anonymity is a terribly blunt instrument for protecting privacy, and it's also fragile. If anonymity was all you have, then you're in deep trouble when someone manages to defeat it.

New information technologies have clearly made anonymity more difficult, yet it does not follow that we must lose our privacy. Instead, these developments bring into stark relief the need for stronger regulatory controls that compel restraint in the way third parties deal with Personal Information that comes into their possession.

A great example is Facebook's use of facial recognition. When Facebook members innocently tag one another in photos, Facebook creates biometric templates with which it then automatically processes all photo data (previously anonymous), looking for matches. This is how they can create tag suggestions, but Facebook is notoriously silent on what other applications it has for facial recognition. Now and then we get a hint, with, for example, news of the Facedeals start up last year. Facedeals accesses Facebook's templates (under conditions that remain unclear) and uses them to spot customers as they enter a store to automatically check them in. It's classic social technology: kinda sexy, kinda creepy, but clearly in breach of Collection, Use and Disclosure privacy principles.

And indeed, European regulators have found that Facebook's facial recognition program is unlawful. The chief problem is that Facebook never properly disclosed to members what goes on when they tag one another, and they never sought consent to create biometric templates with which to subsequently identify people throughout their vast image stockpiles. Facebook has been forced to shut down their facial recognition operations in Europe, and they've destroyed their historical biometric data.

So privacy regulators in many parts of the world have real teeth. They have proven that re-identification of anonymous data by facial recognition is unlawful, and they have managed to stop a very big and powerful company from doing it.

This is how we should look at the implications of the DNA 'hacking'. Indeed, Melissa Gymrek from the Whitehead Institute said in an interview: "I think we really need to learn to deal with the fact that we cannot ever make data sets truly anonymous, and that I think the key will be in regulating how we are allowed to use this genetic data to prevent it from being used maliciously."

Perhaps this episode will bring even more attention to the problem in the USA, and further embolden regulators to enact broader privacy protections there. Perhaps the very extremeness of the DNA hacking does not spell the end of privacy so much as its beginning.

Posted in Social Media, Science, Privacy, Biometrics, Big Data

Technological imperialism

Abstract

Biometrics seems to be going gang busters in the developing world. I fear we're seeing a new wave of technological imperialism. In this post I will examine whether the biometrics field is mature enough for the lofty social goal of empowering the world's poor and disadvantaged with "identity".

The independent Center for Global Development has released a report "Identification for Development: The Biometrics Revolution" which looks at 160 different identity programs using biometric technologies. By and large, it's a study of the vital social benefits to poor and disadvantaged peoples when they gain an official identity and are able to participate more fully in their countries and their markets.

The CGD report covers some of the kinks in how biometrics work in the real world, like the fact that a minority of people can be unable to enroll and they need to be subsequently treated carefully and fairly. But I feel the report takes biometric technology for granted. In contrast, independent experts have shown there is insufficient science for biometric performance to be predicted in the field. I conclude biometrics are not ready to support such major public policy initiatives as ID systems.

The state of the science of biometrics

I recently came across a weighty assessment of the science of biometrics presented by one of the gurus, Jim Wayman, and his colleagues to the NIST IBPC 2010 biometric testing conference. The paper entitled "Fundamental issues in biometric performance testing: A modern statistical and philosophical framework for uncertainty assessment" should be required reading for all biometrics planners and pundits.

Here are some important extracts:

[Technology] testing on artificial or simulated databases tells us only about the performance of a software package on that data. There is nothing in a technology test that can validate the simulated data as a proxy for the “real world”, beyond a comparison to the real world data actually available. In other words, technology testing on simulated data cannot logically serve as a proxy for software performance over large, unseen, operational datasets. [p15, emphasis added].

In a scenario test, [False Non Match Rate and False Match Rate] are given as rates averaged over total transactions. The transactions often involve multiple data samples taken of multiple persons at multiple times. So influence quantities extend to sampling conditions, persons sampled and time of sampling. These quantities are not repeatable across tests in the same lab or across labs, so measurands will be neither repeatable nor reproducible. We lack metrics for assessing the expected variability of these quantities between tests and models for converting that variability to uncertainty in measurands.[p17].

To explain, a biometric "technology test" is when a software package is exercised on a standardised data set, usually in a bake-off such as NIST's own biometric performance tests over the years. And a "scenario test" is when the biometric system is tested in the lab using actual test subjects. The meaning of the two dense sentences underlined by me in the extracts is: technology test results from one data set do not predict performance on any other data set or scenario, and biometrics practitioners still have no way to predict the accuracy of their solutions in the real world.

The authors go on:

[To] report false match and false non-match performance metrics for [iris and face recognition] without reporting on the percentage of data subjects wearing contact lenses, the period of time between collection of the compared image sets, the commercial systems used in the collection process, pupil dilation, and lighting direction is to report "nothing at all". [pp17-18].

And they conclude, amongst other things:

[False positive and false negative] measurements have historically proved to be neither reproducible nor repeatable except in very limited cases of repeated execution of the same software package against a static database on the same equipment. Accordingly, "technology" test metrics have not aligned well with "scenario" test metrics, which have in turn failed to adequately predict field performance. [p22].

The limitations of biometric testing has repeatedly been stressed by no less an authority than the US FBI. In their State-of-the-Art Biometric Excellence Roadmap (SABER) Report the FBI cautions that:

For all biometric technologies, error rates are highly dependent upon the population and application environment. The technologies do not have known error rates outside of a controlled test environment. Therefore, any reference to error rates applies only to the test in question and should not be used to predict performance in a different application. [p4.10]

The SABER report also highlighted a widespread weakness in biometric testing, namely that accuracy measurements usually only look at accidental errors:

The intentional spoofing or manipulation of biometrics invalidates the “zero effort imposter” assumption commonly used in performance evaluations. When a dedicated effort is applied toward fooling biometrics systems, the resulting performance can be dramatically different. [p1.4]

A few years ago, the Future of Identity in the Information Society Consortium ("FIDIS", a research network funded by the European Community’s Sixth Framework Program) wrote a major report on forensics and identity systems. FIDIS looked at the spoofability of many biometrics modalities in great detail (pp 28-69). These experts concluded:

Concluding, it is evident that the current state of the art of biometric devices leaves much to be desired. A major deficit in the security that the devices offer is the absence of effective liveness detection. At this time, the devices tested require human supervision to be sure that no fake biometric is used to pass the system. This, however, negates some of the benefits these technologies potentially offer, such as high-throughput automated access control and remote authentication. [p69]

Biometrics in public policy

To me, biometrics is in an appalling and astounding state of affairs. The prevailing public understanding of how these technologies work is utopian, based probably on nothing more than science fiction movies, and the myth of biometric uniqueness. In stark contrast, scientists warn there is no telling how biometrics will work in the field, and the FBI warns that bench testing doesn't predict resistance to attack. It's very much like the manufacturer of a safe confessing to a bank manager they don't know how it will stand up in an actual burglary.

This situation has bedeviled enterprise and financial services security for years. Without anyone admitting it, it's possible that the slow uptake of biometrics in retail and banking (save for Japan and their odd hand vein ATMs) is a result of hard headed security officers backing off when they look deep into the tech. But biometrics is going gang busters in the developing world, with vendors thrilling to this much bigger and faster moving market.

The stakes are so very high in national ID systems, especially in the developing world, where resistance to their introduction is relatively low, for various reasons. I'm afraid there is great potential for technological imperialism, given the historical opacity of this industry and its reluctance to engage with the issues.

To be sure vendors are not taking unfair advantage of the developing world ID market, they need to answer some questions:


  • Firstly, how do they respond to Jim Wayman, the FIDIS Consortium and the FBI? Is it possible to predict how finger print readers, face recognition and iris scanners are going to operate, over years and years, in remote and rural areas?

  • In particular, how good is liveness detection? Can these solutions be trusted in unattended operation for such critical missions as e-voting?

  • What contingency plans are in place for biometric ID theft? Can the biometric be cancelled and reissued if compromised? Wouldn't it be catastrophic for the newly empowered identity holder to find themselves cut out of the system if their biometric can no longer be trusted?

Posted in Security, Identity, Culture, Biometrics

The fundamental privacy challenges in biometrics

The EPIC privacy tweet chat of October 16 included "the Privacy Perils of Biometric Security". Consumers and privacy advocates are often wary of this technology, sometimes fearing a hidden agenda. To be fair, function creep and unauthorised sharing of biometric data are issues that are anticipated by standard data protection regulations and can be well managed by judicious design in line with privacy law.

However, there is a host of deeper privacy problems in biometrics that are not often aired.


  • The privacy policies of social media companies rarely devote reasonable attention to biometric technologies like facial recognition or natural language processing. Only recently has Facebook openly described its photo tagging templates. Apple on the other hand continues to be completely silent about Siri in its Privacy Policy, despite the fact that when Siri takes dictated emails and text messages, Apple is collecting and retaining without limit personal telecommunications that are strictly out of bounds even to the carriers! Some speculate that biometric voice recognition is a natural next step for Siri, but it's not a step that can be taken without giving notice today that personally identifiable voice data may in future be used for that purpose.
  • Personal Information (in Australia) is defined in the law as "information or an opinion ... whether true or not about an individual whose identity is apparent ..." [emphasis added]. This definition is interesting in the context of biometrics. Because biometrics are fuzzy, we can regard a biometric identification as a sort of opinion. Technically, a biometric match is declared when the probability of a scanned trait corresponding to an enrolled template exceeds some preset threshold, like 95%. When a false match results, mistaking say "Alice" for "Bob", it seems to me that the biometric system has created Personal Information about both Alice and Bob. There will be raw data, templates, audit files and metadata in the system pertaining to both individuals, some of it right and some of it wrong, but all of which needing to be accounted for under data protection and information privacy law.
  • In privacy, proportionality is important. The foremost privacy principle is Collection Limitation: organisations must not collect more personal information than they reasonably need to carry out their business. Biometric security is increasingly appearing in mundane applications with almost trivial security requirements, such as school canteens. Under privacy law, biometrics implementations in these sorts of environments may be hard to justify.
  • Even in national security deployments, biometrics lead to over-collection, exceeding what may be reasonable. Very little attention is given in policy debates to exception management, such as the cases of people who cannot enroll. The inevitable failure of some individuals to enroll in a biometric can have obvious causes (like missing digits or corneal disease) and not so obvious ones. The only way to improve false positive and false negative performance for a biometric at the same time is to tighten the mathematical modelling underpinning the algorithm (see also "Failure to enroll" at http://lockstep.com.au/blog/2012/05/06/biometrics-must-be-fallible). This can constrain the acceptable range of the trait being measured leading to outliers being rejected altogether. So for example, accurate fingerprint scanners need to capture a sharp image, making enrollment sometimes difficult for the elderly or manual workers. It's not uncommon for a biometric modality to have a Fail-to-Enroll rate of 1%. Now, what is to be done with those unfortunates who cannot use the biometric? In the case of border control, additional identifying information must be collected. Biometric security sets what the public are told is a 'gold standard' for national security, so there is a risk that individuals who for no fault of their own are 'incompatible' with the technology will form a de facto underclass. Imagine the additional opprobrium that would go with being in a particular ethnic or religious minority group and having the bad luck to fail biometric enrollment. The extra interview questions that go with sorting out these outliers at border control points is a collection necessitated not by any business need but rather the pitfalls of the technology.
  • And finally, there is something of a cultural gap between privacy and technology that causes blind spots amongst biometrics developers. Too many times, biometrics advocates misapprehend what information privacy is all about. It's been said more than once that "faces are not private" and there is "no expectation or privacy" with regards to one's face in public. Even if they were true, these judgement calls are moot, for information privacy laws are concerned with any data about identifiable individuals. So when facial recognition technology takes anonymous imagery from CCTV or photo albums and attaches names to it, Personal Information is being collected, and the law applies. It is this type of crucial technicality that Facebook has smacked into headlong in Germany.

Posted in Privacy, Biometrics

A response to M2SYS on reverse engineering

M2SYS posted on their blog a critique of the recent reverse engineering of iris templates. In my view, they misunderstand or misrepresent the significance of this sort of research. Their arguments merit rebuttal but the M2SYS blog is not accepting comments, and they seem reluctant to engage on these important issues on Twitter.

Here below is what I tried to post in response.

See also my post about the double standard in how biometrics proponents treat adverse research in comparison with serious cryptographers.

"You're right that reporting of the Black Hat results should not overstate the problem. By the same token, advocates for biometrics should be careful with their balance too. For example, is it fair to say as you do that biometrics are 'nearly impossible' to reverse engineer? And should Securlinx's Barry Hodge play down the reverse engineering as only 'intellectually interesting'?

"The point is not that iris scanning will suddenly be defeated left and right -- you're right the practical risk of spoofing is not widespread nor immediate. But this work and the publicity it attracts serves a useful purpose if it fosters more critical thinking. Most lay people out there get their understanding of biometrics from science fiction movies. Without needing to turn people into engineers, they ought to have a better handle on the technology and realities such as the false positive (security) / false negative (usability) tradeoff, and spoofing.

"My observation is that biometrics advocates have transitioned from more or less denying the possibility of reverse engineering, to now maintaining that it really doesn't matter. But until the industry comes up with a revokable biometric, I think it is only prudent to treat seriously even remote prospects of spoofing."

Posted in Biometrics

The double standard in biometrics analysis

The reverse engineering of biometric iris templates reported at Blackhat this month has attracted deserved attention. Iris now joins face and fingerprint as modalities that have been reverse engineered; that is, it has proved possible to synthesise an image that when processed by the algorithm in question, produces a match against a target template.

The biometrics industry reacts to these sorts of results in a way unbefitting of serious security practitioners.

Take for instance Securlinx CEO Barry Hodge's comment on the iris attack: "All of these articles obsessing over how to spoof a biometric are intellectually interesting but in the practical application irrelevant".

But nobody should belittle the significance of these sorts of results - especially when no practical biometric can be revoked and reissued after compromise.

Mr Hodge, security is an intellectually challenging field. Let's compare the biometrics industry's complacency with the way serious security professionals responded to the problems discovered in the SHA-1 hash algorithm.

Ideal hash algorithms are supposed to produce digest values that are truly random under any variation to the input data. Any ability to predict how a digest varies could conceivably lead to a number of attack scenarios including where digitally signed data might be tampered with, without impacting the signature. The only way to attack an ideal hash algorithm is by brute force: if an attacker wishes to synthesise a piece of data that produces a target hash value (a so-called "collision") they have to work their way through all possible permutations. For a 160 bit hash value, this brute force task taks on the order of 2 to the power of 159 trials, which would be beyond the power of all the world's computers running for millions of years.

In 2005, Chinese academic cryptologists discovered a weakness in the SHA-1 algorithm, that under some circumstances allows a reduction in the number or trials needed for brute force discovery of collisions. The researchers did not reduce the number of trials by very much, and they did not demonstrate any actual attack. No one else feared that this work would produce a practical exploit, and in the following eight years there still has been no report of an attack on SHA-1.

However, cryptographers, security strategists and policy makers worldwide were shaken by the SHA-1 research. They were deeply worried, intellectually, that a digest algorithm could have a structural weakness that compromises its randomness. It meant that the cryptographic community did not really understand SHA-1 as well as they might. And the policy response was swift. The US government sponsored a new digest algorithm competition, which yielded the SHA-2 algorithm, which is now being promulgated globally.

This is good security practice at work. Academics continuously work away at stressing existing techniques and uncovered weaknesses. To stay ahead of attackers, all verified academic weaknesses are taken seriously, and where critical security infrastructure is involved - even when no practical attack has yet been seen - we review and upgrade the security solutions, to stay ahead of our adversaries.

In stark contrast, biometrics advocates seem to fall back on a variation of the Bart Simpson Defence, namely, "I didn't do it; nobody saw me do it; you can't prove a thing".

Over the past few years I've tracked biometrics proponents claim firstly that templates cannot be reverse engineered. They went on to qualify their position to say that certain types of biometrics are "practically impossible" to reverse. And now Mr Hodge is saying it doesn't really matter if they are reversed.

There is no disaster recovery plan in biometrics; they cannot be cancelled and reissued, so of course their advocates cling to the idea they cannot be compromised. And with that attitude they further distinguish themselves in infosec, for no one else ever acts as though their technology is perfect.

Posted in Security, Biometrics

Understanding biometrics and their necessary fallibility

Most lay people get their understanding of biometrics from watching science fiction movies, where people stare at a camera and money comes out. And unfortunately, some biometrics vendors even use sci-fi films in their sales presentations as if they're case studies. In reality, biometrics just don't work as portrayed.

Here we'll spend just five or ten minutes looking a bit more deeply, to help set reaslistic expectations of this technology.

In practice, the most important thing about biometrics is their fallibility. Because of the vagaries of human traits and the way they vary from day to day, biometrics have to cope with the same person appearing a little different each time they front up. Inevitably this means that occasionally a biometric system will confuse one person with another. So what? Well, there are two major foibles of all biometrics that go unmentioned by most vendors:

1. There is an inherent trade off in all biometrics, between their ability to discriminate between different people (specificity) and their ability to properly recognise all users (sensitivity). You can't have it both ways; a system that is very specific will be more inclined to reject a legitimate user, and conversely, a system that never fails to recognise you will also tend to occasionally confuse you with someone else. Yet biometrics vendors often quote their best case False Reject and False Accept figures side by side, as if they're achievable simultaneously.

2. The only way to improve sensitivity and specificity at the same time is to tighten the enrolment and scanning conditions and/or the mathematical models that underpin the algorithms. In other words, to make the systems choosier. This is why really serious biometrics like face recognition for passports and driver licences require stringent lighting conditions and image quality, and why we should be wary of biometrics in mobile devices where there is almost no control over lighting and sound.

Uncertainty accumulates

The least technical criticism of biometrics concerns the fallibility of all measurement methods. Cameras, sensors and microphones – like human eyes and ears – are imperfect, and the ability of a biometric authentication system to distinguish between subtly different people is limited by the precision of the input devices.

Even if the underlying biological traits of interest are truly unique, it does not follow that our machinery will be able to measure them faithfully. Take the iris. This biometric is often promoted with the impressive claim that the probability of two individuals’ iris patterns matching is one in ten to the power of 78. These are literally astronomical odds; there are fewer atoms in the universe than 10-to-the-78. Yet does this figure necessarily tell us how accurate the end-to-end biometric system really is? Consider the fact that there are ten billion stars in the Milky Way. If two people look up in the night sky and each pick a star at random, is the probability of a match one in ten billion? Of course not, because of the limits of our measurement apparatus, in this case the naked eye. Interference too affects the precision of any measurement; the odds of two people in a big city picking the same star might be no better than one in a hundred.

The Sensitivity-Specificity tradeoff: False Positives and False Negatives

Biometric authentication entails a long chain of processing steps, all of which are imperfect. Each step introduces a small degree of uncertainty, as shown in the schematic below. Uncertainty is inescapable even before the first processing step, because the body part being measured can never appear exactly the same. The angle and pressure of a finger on a scanner, the distance of a face from a camera, the tone and volume of the voice, the background noise and lighting, the cleanliness of a lens all change from day to day. A biometric system cannot afford to be too sensitive to subtle variations, or else it can fail to recognise its target; a biometric must tolerate variation in the input, and inevitably this means the system can sometimes confuse its target for someone else.

Lockstep FAR FMR explanation (0 4) A


Therefore all biometric systems inevitably commit two types of error:

1. A “False Negative” is when the system fails to recognise someone who is legitimately enrolled. False Negatives arise if the system cannot cope with subtle changes to the person’s features, the way they present themselves to the scanner, slight variations between scanners at different sites, and so on.

2. A “False Positive” is when the system confuses a stranger with someone else who is already enrolled. This may result from the system being rather too tolerant of variability from one day to another, or from site to site.

False Positives and False Negatives are inescapably linked. If we wish to make a given biometric system more specific – so that it is less likely to confuse strangers with enrolled users – then it will inevitably become less sensitive, tending to wrongly reject legitimate enrolled users more often.

The following schematics illustrate how a highly specific biometric system tends to commit more False Negatives, while a highly sensitive system exhibits relatively more False Positives.

Lockstep FAR FMR explanation (0 4) B


A design decision has to be made when implementing biometrics as to which type of error is less problematic. Where stopping impersonation is paramount, such as in a data centre or missile silo, a biometric system would be biased towards false negatives. Where user convenience is rated highly and where the consequences of fraud are not irreversible, as with Automatic Teller Machines, a biometric might be biased more towards false positives. For border control applications, the sensitivity-specificity trade-off is a very difficult problem, with significant downsides associated with both types of error – either immigration security breaches, or long queues of restless passengers.

Any biometric system, in principle at least, can be tuned towards higher sensitivity or higher specificity, depending on the overall desired balance of security versus convenience. The performance at different thresholds is conventionally shown by a "Detection Error Tradeoff" (DET) curve.

Biometrics vendors tend to keep their DET curves confidential, and usually release commercial solutions where the ratio of False Accept Rate (FAR) to False Reject Rate (FRR) is fixed. The following DET curves are over ten years old but they remain some of the few examples that are publicly available, and they usefully compare several biometric technologies side by side.

DET Curve Illustrations for blog post 120511


Ref: "Biometric Product Testing Final Report" Issue 1.0, 2001 by the UK Government Communications Electronics Security Group (CESG).

Vendors occasionally specify the "Equal Error Rate" for their solutions. It's important to understand what this spec is for. No real world biometric that I'm aware of is deployed with FAR and FRR tuned to be the same. Instead, the EER should be used as a benchmark for broadly comparing different technologies.

EER provides another useful ready reckoner. If a vendor specifies for example FAR = 0.0001% and FRR = 0.01% and yet you find that the EER is, say, 1% -- that is, greater than both the quoted FAR and FRR -- then you know that the vendor is quoting best case figures that cannot be realised simultaneously. Just look at the DET curves above. When False Accept Rate is 0.1% (ie false positives of 1 in a 1000) the False Reject Rate for ranges from at least 5% to as much as 30%. And we can see that an FAR of 0.0001% is really extreme; for most biometrics, such specificity leads to False Rejects of one in two or worse, rendering the solution unusable.

Failure To Enrol

Over and above the issues of False Positives and False Negatives is the unfortunate fact that not everyone will be able to enrol in a given biometric authentication system. At its extremes, this reality is obvious: individuals with missing fingers, or a severe speech impediment for example, may never be able to use certain biometrics.

However, failure to enrol has a deeper significance for more normal users. To minimise False Positives and False Negatives at the same time (as illustrated in the next figiure), a biometric method generally must tighten requirements on the quality of its input data. A fingerprint scanner for instance will perform better on high definition images, where more fingerprint features can be reliably extracted. If a fingerprint detector sets a relatively stringent cut-off for the quality of the image, then it may not be possible to enrol people who happen to have inherently faint fingerprints, such as the elderly, or those with particular skin conditions.

DET Curve Illustrations for blog post 120511 (EER)


More subtle still is the effect of modelling assumptions within biometric algorithms. In order to make sense of biological traits, the algorithm has to have certain expectations built into it as to how the features of interest generally appear and how those features vary across the population; after all, it is the quantifiable variation in features which allows for different individuals to be told apart. Therefore, face and voice recognition algorithms in particular might be optimised for the statistical characteristics of certain racial groups or nationalities, making it difficult for people from other groups to be enrolled.

The impossibility of enrolling 100% of the population into any biometric security system has important implications for public policy. Clearly there can be at least the perception of discrimination against certain minority groups, if factors like age, foreign accent, ethnicity, disabilities, and/or medical conditions impede the effectiveness of a biometric system. And careful consideration must be given to what fall-back security provisions will be offered to those who cannot be enrolled. If there is a presumption that a biometric somehow provides superior security, then special measures may be necessary to provide equivalent security for the un-enrolled minority.

Posted in Biometrics

A penny for your marketable thoughts?

Most people think that Apple's Siri is the coolest thing they've ever seen on a smart phone. It certainly is a milestone in practical human-machine interfaces, and will be widely copied. The combination of deep search plus natural language processing (NLP) plus voice recognition is dynamite.

And Siri also marks a new milestone in privacy invasion. I predict Siri will become the poster girl for PII piracy, the exemplar of the sly bargain for Personal Information at the heart of most social media.

If you haven't had the pleasure ... Siri is a wondrous new function built into the latest iPhone. It’s the state-of-the-art in artificial intelligence and NLP. You speak directly to Siri, ask her questions (yes, she's female) and tell her what to do with many of your other apps. Siri integrates with mail, text messaging, maps, search, weather, calendar and so on. Ask her "Will I need an umbrella in the morning?" and she'll look up the weather for you – after checking your calendar to see what city you’ll be in tomorrow. It's amazing.

Natural Language Processing is a fabulous idea of course. It radically improves the usability of smart phones, and even their safety with much improved hands-free operation.

An important technical detail is that NLP is very demanding on computing power. In fact it's beyond the capability of today's smart phones, even if each of them alone is more powerful than all of NASA's computers in 1969!. So all Siri's hard work is actually done on Apple's mainframe computers scattered around the planet. That is, all your interactions with Siri are sent into the cloud.

Imagine Siri was a human personal assistant. Imagine she's looking after your diary, placing calls for you, booking meetings, planning your travel, taking dictation, sending emails and text messages for you, reminding you of your appointments, even your significant other’s birthday. She's getting to know you all the while, learning your habits, your preferences, your personal and work-a-day networks.

And she's free!

Now, wouldn't the offer of a free human PA strike you as too good to be true?

Indeed it would. So realise this about Siri: she's continuously reporting back to Apple about your every move. If Apple were a PA placement agency, what they get in return for the free secretarial services is a full transcript of all you've said, everyone you've been in touch with, everything you've done. Apple won't say what they plan to do with all this data, how long they'll keep it, nor who they'll share it with. Apple's Privacy Policy (dated October 2011, accessed 12 March 2012) doesn't even mention Siri nor the collection of the voice-to-text data.

When you dictate your mails and text messages to Siri, you’re providing Apple with content that's usually off limits to carriers, phone companies and ISPs. Siri is an end run around telecommunicationss intercept laws.

Of course there are many, many examples of where free social media apps mask a commercial bargain. Face recognition is the classic case. It was first made available on photo sharing sites as a neat way to organise one’s albums, but then Facebook went further by inviting photo tags from users and then automatically identifying people in other photos on others' pages. What's happening behind the scenes is that Facebook is running its face recognition templates over the billions of photos in their databases (which were originally uploaded for personal use long before face recognition was deployed). Given their business model and their track record, we can be certain that Facebook is using face recognition to identify everyone they possibly can, and thence work out fresh associations between countless people and situations accidentally caught on camera. Combine this with image processing and visual search technology (like Google's "Goggles") and the big social media companies have an incredible new eye in the sky. They can work out what we're doing, when, where and with whom. Nobody will need to like expressly "like" anything anymore when Facebook can see what cars we're driving, what brands we're wearing, where we spend our vacations, what we're eating, what makes us laugh. Apple, Facebook and others have understandably invested hundreds of millions of dollars in image recognition start-ups and intellectual property; with these tools they convert the hitherto anonymous image collections in Picassa, Flickr and the like into content-addressable PII gold mines. It's the next frontier of Big Data.

Now, there wouldn't be much wrong with these sorts of arrangements if the social media corporations were up-front about them. In their Privacy Policies they should detail what Personal Information they are extracting and collecting from all the voice and image data; they should explain why they collect this information, what they plan to do with it, how long they will retain it, and how they promise to limit secondary usage. They should explain that biometrics technology allows them to generate brand new PII out of members' snapshots and utterances. And they should acknowledge that by rendering data identifiable, they become accountable in many places under privacy and data protection laws for its safekeeping as PII. It's just not good enough to vaguely reserve their rights to "use personal information to help us develop, deliver, and improve our products, services, content, and advertising". They should treat their customers -- and all those innocents about whom they collect PII indirectly -- with proper respect, and stop pretending that 'service improvement' is what they're up to.

Siri along with face recognition herald a radical new type of privatised surveillance, and on a breathtaking scale. While Facebook stealthily "x-ray" photo albums without consent, Apple now has even more intimate access to our daily routines and personal habits. And they don’t even pay as much as a penny for our thoughts.

As cool as Siri may be, I myself will decline to use any natural language processing while the software runs in the cloud, and while the service providers refuse to restrain their use of my voice data. I'll wait for NLP to be done on my device with my data kept private.

And I'd happily pay cold hard cash for that kind of app, instead of having an infomopoly embed itself in my personal affairs.

Posted in Social Media, Privacy, Language, Biometrics, Big Data, Social Networking

That's what I call hype

A modest little quote from a biometrics expert caught my eye this week. Neil Fisher, VP of Global Security Solutions at Unisys was cited describing the False Acceptance Rate of iris scanning as "in the region of 0.1%". See Believing in biometrics, at "Airport Technology", http://www.airport-technology.com/features/featurebelieving-in-biometrics.

This figure is, to put it mildly, rather less than what we've been led to believe by iris scanning proponents over the years.

It is widely reported that the probability of two randomly selected irises matching is one in 10 to the power of 78 [1]. This is indeed a staggering denominator, far greater than the number of stars in all the galaxies in all the universe [Yet that number is near meaningless if the iris scanning equipments isn't perfect. Consider that there are 100 billion stars in the Milky Way but that figure doesn't predict the odds of two people picking out the same star with the naked eye, which is one in a few hundred or worse depending on the lighting conditions.]

Yet the recognised inventor of iris recognition, John Daugman of Cambridge University, never claimed his method was as good as all that. In 2000, Daugman published a technical paper [2] on iris detection decision thresholds. Based on data from an ophthalmology research database, his calculations implied [3] a False Match rate as low as one in 10 to the power of 14.

In 2005 Daugman experimentally verified his very low error rate claim using data on over 600,000 individuals sampled in the United Arab Emirates' immigration security system [4]. He reported that "False Match rate is less than 1 in 200 billion" or one in 10 to the power of 11. But it should have been clear to all that the result would be very best case, for border security biometrics systems impose tight control over image quality and lighting conditions for both enrolment and subsequent capture events; without such control, measurement fidelity suffers.

And indeed, independent government testing of iris biometrics, while impressive, show error rates millions of times worse than Daugman's estimates. For example, the UK Government in 2001 found a False Match rate of 0.0001% or one in a million [5].

And now we have a leading biometrics implementer say that in practice, the iris False Match Rate is typically 0.1% or a pretty ordinary one in 1,000. If that's the real life benchmark, then the folkloric figure of one in 10 to the power of 78 represents an exaggeration of one thousand, trillion, trillion, trillion, trillion, trillion, trillion times.

Literally.




[1] See e.g. http://www.aditech.co.uk/irisrecognitiontechnology.html. Or try Googling iris "1 in 10 to the power of 78".
[2] Biometric decision landscapes, Daugman, 2000; http://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-482.pdf
[3] http://www.sans.org/reading_room/whitepapers/authentication/dont-blink-iris-recognition-biometric-identification_1341.
[4] Results from 200 billion iris cross-comparisons John Daugman, University of Cambridge Computer Laboratory, http://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-635.pdf.
[5] Biometric Product Testing Final Report, Issue 1.0; Mansfield et al, Centre for Mathematics and Scientific Computing, National Physical Laboratory, for the UK Government Communications Electronics Security Group (CESG) Biometric Test Programme, 2001; http://www.cesg.gov.uk/publications/Documents/biometrictestreportpt1.pdf.

Posted in Biometrics

The birthday paradox and biometrics

The inventor of forensic DNA testing, Dr Alec Jeffreys, has cautioned that once millions of DNA samples are collected in population databases, false matches rise significantly.

DNA testing is not an infallible proof of identity. While Jeffreys' original technique compared scores of markers to create an individual "fingerprint," modern commercial DNA profiling compares a number of genetic markers - often 5 or 10 - to calculate a likelihood that the sample belongs to a given individual.

Jeffreys estimates the probability of two individuals' DNA profiles matching in the most commonly used tests at between one in a billion or one in a trillion, "which sounds very good indeed until you start thinking about large DNA databases." In a database of 2.5 million people, a one-in-a-billion probability becomes a one-in-400 chance of at least one match.

[Ref: DNA Fingerprint Privacy Concerns Jill Lawless, CBS News.]

Dr Jeffreys is alluding to the Birthday Paradox, where the chance of any pair of people being matched on a random trait rises dramatically and counter-intuitively in groups of people. At a gathering of just 25 people, the chances are better than 50:50 that a pair of people in the group will have the same birthday. The implication for forensic databases is that it's highly likely that somewhere in the set, there will be pairs of different people that happen to have biometric data that fall within the tolerance of the matching algorithm. In other words, the matching software will confuse them. The designers of driver licence and immigration databases need to put protocols in place that double-check automatic matches so as to avoid impugning innocent people. By-and-large, the protocols I have seen in practice work well, but these practicalities are glossed over by biometrics vendors who continue to over-hype their technologies.

In the context of population databases, we see once again why the adjective "unique" is a wrong and misleading way of describing biometrics. No biometric trait has a zero probability of a false match, so none of them can be described as "unique". And even the highly distinctive traits like DNA can lead to surprisingly frequent false detects in large databases.

So it bears repeating, biometrics don't work as well as suggested by science fiction movies.

Posted in Biometrics