Over the past few decades, the volume of digital data has grown significantly and amongst that data is a high proportion of personal data which makes useful and valuable information to organisations, public and commercial, and researchers.
Anonymisation of data is not a new process and has historically been used particularly when transferring personal data to researchers such as medical records, website traffic and online purchasing, etc. and for archiving purposes. With the arrival of the ‘Internet of Things’ looming, the importance of researching the patterns in personal data in relation to using devices which are connected to the internet will be vital in organisations delivering a well connected experience and to push appropriate products, services and information to us as users of the connected devices.
Whilst some people may have reservations about connected devices and the tracking that happens as a result, others and most likely those who have been brought up in the connected digital age may not share the same concerns however, it has been and is the first responsibility of government to protect their citizens. In the modern age this is less likely to be from marauding hordes but more likely, from online abuses and misuses of personal data and of course, from the ordinary person’s lack of understanding of the risks of their actions therefore, the UK government and the EU have enacted laws to protect the data subject from others and themselves including the Regulation (EU) 2016/679 known as the General Data Protection Regulation (GDPR) and the Data Protection Act 2018 (DPA 2018).
It is in this environment that researchers and organisations operate. Commercial organisations have a vested interest in connecting products and services to data subjects and returning value to shareholders and investors. Public organisations have a vested interest in connecting and delivery services to data subjects by the most efficient means to the taxpayer. Researchers have a vested interest analysing and understanding the patterns and indicators in the personal data of data subjects so they can anticipate such things as the potential for ill health, likely next purchase, propensity for something, etc., researchers just love to play with and interpret data. The key point here is that these organisations and researchers want our personal data and our personal data is therefore valuable.
Over time, personal data storage increases and reaches a point where that data needs to be archived and kept, maybe for record keeping etc., while other data might be requested by researchers for a new project or study. Both requirements involve the processing of personal data and when it is moved out of regular digital systems and transferred for archiving or researching, as data protection law requires the highest level of security, a common method used by organisations is to anonymise that personal data and the Information Commissioner’s Office (ICO), the UK data protection regulator, and the European Data Protection Board (EDPB) suggest anonymisation ‘safeguards individuals’ privacy’ and is a ‘good strategy’ for the secure transfer of personal data.
What is the anonymisation of personal data? Anonymised personal data when properly done will have all personal identifiers removed to a point where the data subject is no longer identified outright, or identifiable with the addition of other information and therefore, moves the anonymised data out of the scope of the GDPR as it would no longer be classed as personal data. It might be a fair statement to say that for the ordinary person, if asked what they thought anonymised personal data meant, would say something like, ‘it is secured’, ‘I am unidentifiable’, etc. The ordinary person would believe in the absoluteness of the process of anonymisation. However, those who operate in the field of data protection understand that anonymised does not means secure and unidentifiable.
In the 2000s, three considerable databases which were said to be anonymised with all the identifiable personal information removed were de-anonymised and individual data subjects were clearly identified. This was possible because data records show patterns and those patterns do not need to be attached to clearly identifiable data subject to be able to identify them. Human beings leave trails, digital footprints that point to who they are, what type of person they are, what race they are, what gender they are, where they live, eat, drink, work, socialise and more. That combination of footprints will help the data user, the researcher, to triangulate back to the individual data subject.
A ’truly anonymised’ personal data set is of little value to researchers as they need the very patterns that make us identifiable to understand what the data tells us about a data subject’s propensity for ill-health in order to develop pro-active treatments; how they use their internet connected devices to enable those devices to deliver better functionality; what search terms they use in order to enable the delivery of targeted advertising, etc.
If a personal data set is anonymised for archive purposes, this implies that at some future point someone may be required to de-anonymise the data to retrieve some information and therefore, that ‘someone’ holds the ‘key’ to unlock the data set and reveal the personal data within it. As ‘other information [e.g., the key]’ exists to be able to re-identify the individuals, this would mean that the anonymised personal data set was really pseudonymised and would remain as personal data. This interpretation appears correct when you read the answer from the European Commission to the question ‘What is personal data?’ where they state, ‘For data to be truly anonymised, the anonymisation must be irreversible’.
This distinction between anonymised personal data and truly anonymised personal data is important for organisations and data protection officers (DPO) to understand as personal data must attract greater security from that which is not defined as personal data. DPOs also have to respond to subject access requests (SAR) as data subjects now have greater access to and transparency of their personal data, and organisations are subject to more significant and potentially punitive corrective measures. If ‘anonymised’ personal data exists for which there is a ‘key’ or an ability to add information and de-anonymise the personal data, then organisations and DPOs will need to know of its existence to enable close management and response to SARs.
For more information and to speak with the author, use our Contact Us page.
Notes and References
 Calculated at 2.6 exabytes in 1986 (20% analogue / 80% digital), 15.8 exabytes in 1993 (31% analogue / 67% digital), 54.4 exabytes in 2000 (3% analogue / 97% digital) and 295 exabytes in 2007 (1% analogue / 99% digital). Information from Hilbert M., López P., ‘The World’s Technological Capacity to Store, Communicate, and Compute Information’ (Science 332(6025):60-5), February 2011) <https://science.sciencemag.org/content/332/6025/60/tab-pdf> accessed 13 April 2019
 GDPR Definition of: ‘personal data’ means any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person. Regulation (EU) 2016/679, art. 4(1)
 “It is the first responsibility of government in a democratic society to protect and safeguard the lives of its citizens.” Lord Hope of Craighead, A v Secretary of State for the Home Department  2 AC 68 para 99
 Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation) OJ L119
 Data Protection Act 2018 c. 12
 GDPR Definition of: ‘processing’ means any operation or set of operations which is performed on personal data or on sets of personal data, whether or not by automated means, such as collection, recording, organisation, structuring, storage, adaptation or alteration, retrieval, consultation, use, disclosure by transmission, dissemination or otherwise making available, alignment or combination, restriction, erasure or destruction. Regulation (EU) 2016/679, art. 4(2)
 ICO, ‘Anonymisation: managing data protection risk code of practice’ (ICO, 2012)  < https://ico.org.uk/media/for-organisations/documents/1061/anonymisation-code.pdf> accessed 05 April 2019
 Article 29 Working Party, ‘Opinion 05/2014 on Anonymisation Techniques WP216’ (European Data Protection Board, 2014)
 n vii.  3.
 n viii.  1.
 The GDPR’s Material Scope ‘applies to the processing of personal data’. If the data no longer identifies a data subject or cannot identify a data subject even with additional information, then it falls out of the scope of the GDPR. Regulation (EU) 2016/679, art. 2(1), rec. 15
 The AOL Data Release, 1990 Census Data and the The Netflix Prize Data Study as covered in Ohm P., ‘Broken Promises Of Privacy: Responding To The Surprising Failure Of Anonymization’ (UCLA Law Review, 2010) pages 1717 to 1722 <https://www.uclalawreview.org/broken-promises-of-privacy-responding-to-the-surprising-failure-of-anonymization-2/> accessed 5 April 2019
 In the summary of a presentation given to the ICO Privacy and Data Anonymisation Seminar on 30 March 2011 ‘From Data to Health’ by Sir Mark Walport, Director of The Wellcome Trust. Walport links good data to good public health and the ‘ability to link large datasets on, for example, health, housing, the environment’. He also identifies that these ‘linkages’ cannot be completed with ‘truly anonymised data’ but will materially depend upon pseudonymised data sets. ICO, ‘Summary of ICO privacy and data anonymisation seminar’ (ICO, London, 20 March 2011) <https://ico.org.uk/media/1042332/anonymisation-seminar-report.pdf> accessed 07 April 2019
 Lord Hope of Craighead in Common Services Agency v Scottish Information Commissioner referring to personal data that might be provided in an anonymised form ‘the individuals could be identified either from the data themselves or from the data and other information [i.e., the key] in the possession of ISD’. Common Services Agency v Scottish Information Commissioner (Scotland)  UKHL 47,  
 Reform of EU data protection laws, ‘What is personal data?’ (European Commission) <https://ec.europa.eu/info/law/law-topic/data-protection/reform/what-personal-data_en> accessed 10 April 2019