Types of Personal Information

It’s really vital that technologists, software developers, architects and analysts appreciate that privacy law takes a broad view of “Personal Information” and how it may be collected. In essence, whenever any information pertaining to an identifiable individual comes to be in your IT system by whatever means, you may be deemed to have collected Personal Information it for the purposes of the law (for example, Australia’s Privacy Act 1988 Cth). And what follows from any PI collection is a range of legal obligations relating to the 10 National Privacy Principles.

A while back I tried to illuminate the problem space from a technologist’s standpoint, in a paper called Mapping privacy requirements onto the IT function (Privacy Law & Policy Reporter, 2003). At the time it seemed useful to me to break down different types of Personal Information, because I had found that most application developers only thought about questionnaires and web forms. I wrote then:

Personal data collection can be considered under five categories:
(1) overt collection via application forms, web forms, call centres, face to face interviews, questionnaires, warranty cards and so on;
(2) automatic collection, especially via audit logs and transaction histories;
(3) generated data, which includes evaluative data and inferences drawn from collected data, for the purposes of service customisation (for example buying preferences), business risk management (such as insurance risk scores from claims histories) and so on;
(4) acquired data which has been transferred from a third party, with or without payment for the data, including cases where personal information is acquired as part of a corporate takeover; and
(5) ephemeral data, which is a special category of automatic or generated data, produced as a side effect of other operations. Ephemeral data is reasonably presumed to be transient but can be inadvertently retained. For example, some systems prompt users for pre-arranged challenge-response information — classically their mother’s maiden name — when dealing with a forgotten password. The data provided can be left behind in computer memory or logs, or even scribbled on a sticky note by a help desk operator, and this can represent a major privacy breach if it is not protected from unauthorised parties.

This may still be a useful orientation for many engineers and technologists. They need to remember that even if it’s found lying around in the public domain, or even if they’ve conjured it up from Big Data by clever data anaysis, if they have got their hands on Personal Information, then they have collected it.

Speaking of Big Data, I wonder if the categorisation of Personal Data could now be improved or extended?