One problem that pops up frequently is inappropriate combining of data sets or linkage of records. The problem is most acute for holders of significant amounts of data for large numbers of individuals, such as health and care organisations, education providers, government agencies and major corporations. As the number of individuals with whom these bodies interact with increases, so does the likelihood that those individuals share one or more identifiers.
Merging records
Recently, the Information Commissioner’s Office (ICO) issued a reprimand to West Midlands Police after they incorrectly linked and merged the records of two separate individuals. The individuals had the same name, the same date of birth, and lived in the same geographic area. Both were victims of crime, but one was also a suspect in a criminal case. Linking the records led to errors on at least four occasions, including officers attending at the wrong person’s address when responding to significant safeguarding concerns. Sensitive information relating to one individual was also sent to the other. Once records had been merged, the relevant system design meant that they could not be unmerged.
It was a breach of the fourth data protection principle - the requirement to ensure that data is accurate. The sixth data protection principle was also breached, as the West Midlands Police had failed to implement appropriate technical and organisational measures. Failure to rectify the situation within an appropriate period was also a breach of the requirement to erase and/or rectify data. The seemingly simple error opened West Midlands Police up to reputational, regulatory, litigation and financial risk.
Such small errors can also have an impact internally causing operational issues and diverting resources. For example, a Higher Education Institution in which two staff members shared the same name (though different specialities) received a Data Subject Access Request under Article 15 of the UK GDPR when one of the employees grew frustrated with their employer’s inability to segregate each individual’s personnel record. As a first step towards making disclosure, the employer had to identify what from each record should be attached to which individual – the outcome that the employee was asking for all along. The remainder of the Data Subject Access Request process and the employee dissatisfaction that triggered the request, could easily have been avoided with proper records management.
Comparing data sets
Sometimes organisations run checks across their datasets to check data accuracy. Comparison of datasets without a mechanism to record where ‘matches’ are incorrect is a problem from a GDPR perspective.
In January 2024, the Guardian newspaper reported on an instance where an apparent match between a pension dataset and the register of deaths caused a retired octogenarian to repeatedly lose access to her pension. The ‘match’ causing the problem was the same each time, raising the question as to why the pension provider had not logged that the link was disproved. Although it was agreed after media intervention that the names would be permanently ‘decoupled’, such action can and should be built into the system.
So, what should you do?
Data accuracy must be considered, not overlooked, during the design and implementation stage for any new system. Data protection by design requires a certain level of’ scenario planning, which should include how individuals with similar names or other identifiers can be handled. Even if there is no duplication at present – how will the system manage when another ‘John Smith’ arrives?
Legacy systems may also need to be reviewed to ensure that they allow for proper identification of employees. While individuals are always provided with unique email addresses, do others have drop down lists of employees that only state legal names? An inability to properly identify which "John Smith’’ is which, can mean that work and benefits are misallocated, and inappropriate sharing of employee data with each other may occur.
Where datasets are to be compared, parameters could be placed on the system so that any comparison occurs only between ‘new’ data and the dataset you are checking for accuracy. Alternatively, a record can be kept of instances of incorrect matching, to ensure that they can be removed from the result set for any future comparison.
Our content explained
Every piece of content we create is correct on the date it’s published but please don’t rely on it as legal advice. If you’d like to speak to us about your own legal requirements, please contact one of our expert lawyers.