In praise of metadata

The term metadata has become rather loaded—perhaps even poisoned—for its association with telecommunications surveillance. But I want to sing its praises because it is metadata that tells us if any given information is accurate or reliable, The fitness for purpose and value of any data lies in metadata.

National security hawks advocating stronger surveillance powers have tried to whitewash metadata collection. They have likened telecommunications metadata to the visible details on an ordinary envelope and insist that it’s innocuous compared with the contents of the message.

On the other hand, U.S. General Michael Hayden, former head of the National Security Agency, once stated plainly and simply “we kill people based on metadata”.

Data and metadata: same but different

From a privacy perspective, metadata should not be distinguished from data in general.

As I understand prevailing principles-based privacy law, if a piece of metadata is personally identifiable, then it constitutes personal data and falls within the scope of such law.

So from that perspective, metadata is merely more data.

Nevertheless, I find it useful to distinguish data and metadata, because the properties of data that make it valuable or reliable (or unreliable) are often codified in metadata.

For example:

  • the age of a cell phone number or email address can suggest it may be a burner account being used in a fraud
  • data presented online for identification purposes or in card-not-present payments really needs to be “original” in the sense that it’s presented by the rightful subject
  • watermarks generated within digital camera hardware can prove that an image is genuine, rather than AI-generated
  • clinical trial results should be based on patient data collected under proper consent conditions.

We might argue that the value of any data lies in the metadata.

Verifiable credentials are really about metadata

This distinction between data and metadata also illuminates how verifiable credentials convey rich data quality signals.

Verifiable credentials are tools that convey machine readable assertions made by a third party about a subject — which is usually a human but verifiable credentials for non-human subjects such as IoT devices are expanding fast.

The important elements of verifiable credentials (and verifiable presentations) are:

  • they usually name or point to the subject of the credential (although this can be indirect, to preserve privacy)  
  • they name the issuer of the credential
  • they bear the digital signature of the issuer — which gives the credential provenance
  • the presentation bears the signature of the subject (ideally generated within in a secure wallet) — which indicates their consent or control
  • the credential carries a range of administrative metadata, such as validity date, applicable terms & conditions, and details of the device carrying the credential.

The digital signature on a verifiable presentation is typically created automatically in a wallet or chip. The signing process uses a private key embedded in the firmware, unique to the subject, but not visible to them.

The issuer of a credential is one of the most important factors used to determine whether to accept that credential or not. That is, the issuer confers value; some issuers are valued more than others.

So the name of the issuer of the credential is metadata of the credential: it is something that a first party wants to know about a second party before deciding to do a transaction.

About the credit cards in the banner graphic

The banner at the top of this blog shows some favourite images from my archive, from the very first charge card in 1955 through a series of technological innovations. First, the printed cardholder details were coded on a magnetic stripe for automated reading; tougher plastic cards supported antifraud measures like holograms and guilloche printing; the magnetic stripe was superseded by smart chips that prevent copying; and smartcards gave way to smart phones with biometrics to further protect the cardholder.

This evolution is really all about metadata!

Cards and phones, as far as the card payment system is concerned, are just data carriers. They store details about the card holder (in particular the Primary Account Number or PAN) and facilitate the presentation of that data to a merchant. The move from mag stripe to chip was the most important security measure in sixty-odd years; the chip provides signals about the originality of the PAN and the consent of the cardholder to each presentation.

Every major upgrade of credit card technology has improved the metadata that protects the primary data. All along, over several decades, the primary data has remained the same. But it has got better, thanks to metadata,

I analysed the evolution of cards in more detail here: A CMM for personal data carriers and digital wallets.

“Attributes” and “Claims”

The late great Kim Cameron — author of the prized Laws of Identity — carefully used the term “claim” in defining digital identity. The words claim, attribute and assertion might seem interchangeable but Kim singled out claim as “an assertion of the truth of something, typically one which is disputed or in doubt”.

He stressed that there is always uncertainty in the real world, and when authenticating another party, there is always going to be doubt over important attributes.

The trick is to reduce that doubt to acceptable levels. A relying party will always reserve the right to decide for itself if its doubts have been resolved.

Verifiable credentials provide multiple pieces of metadata for doing just that. Think of metadata here as evidence to bolster confidence in an attribute, 

For one thing, a verifiable credential bears the name of the credential issuer. In many cases, there is a natural issuer of a credential of interest: driver licences are issued by government departments of motor vehicles, employee numbers are issued by employers, credit card numbers are issued by banks. When verifying a credential, one of the most important things to check is the issuer.

Metadata IRL

This coupling of data and metadata is familiar in the analogue world.

In courtroom dramas, stories turn on facts and evidence.  The facts tendered in a court case are only as good as the evidence. And there are rules of evidence governing how information is obtained and safeguarded.

Facts and evidence in court procedures correspond to claims and proofs in digital identity. It’s all data and metadata.

“How do you know?”

In science, it’s not just what you know that matters but how do you know. What is the source of a statement or claim? Where is the evidence? How good is the evidence? 

Children know this instinctively. As they develop a sense of how knowledge and trust are fluid, plucky kids will challenge the things they are told, with the riposte “How do YOU know?”.

Metadata and the stories behind the data.

Metadata can tell the story behind the data, a story that is increasingly important in all things digital.

As data supply chains become ever more complicated, we need enhanced abilities to interrogate the information we receive and depend on — whether that’s a news report, an photographic image, a student’ essay, a CV, the results of an automobile’s emissions test, or a scientific report on climate change.

Where did a given piece data come from? Who and/or what contributed to it?

The products of generative AI are starting to be watermarked, but that’s only a start. It will be important to know more, like which algorithms and version numbers were used, where did the models run, how were they trained, and was the training data audited?

Looking at signatures as metadata

With this orientation, everywhere I look now, I see metadata!

For example, a less obvious example of metadata is digital signatures.

A digital signature is a data value or “cryptogram” usually calculated by encrypting a record (or a hash of the record) using a private key controlled by an actor. 

The signature on a record can be checked at any future time to verify that a particular actor had something to do with that record, such as creating it or agreeing to it.

There are many different applications for digital signatures — but they are all used to create evidence that a given record at a certain time was touched in some way by a certain actor. That is, the signature tells a story about the history of a record. The digital signature is more metadata.