Privacy, PII and GDPR: The Quest for the Unknown

Broadly, regulations require companies to protect the privacy of their employees and other people whose data they collect.  Generally we take it for granted that when we give our supermarket our details, that they will take good care of our Personally Identifiable Information (PII)

In actual fact, it’s almost a given that some of our personal details are already out there. Helpdesk employees have been known to supplement their incomes by selling batches of client details on the black market, hackers target popular websites for poorly protected client details, disgruntled employees walk out with whatever they can get their hands on, and marketing departments routinely break retention rules without even realising it by using customer details for purposes beyond the reason they were initially collected. (To check if you’ve been subject to malicious attack, check out

Increasingly companies are having to find ways to cope with security, privacy and retention requirements around Personally Identifiable Information.

What is PII?

It’s important to understand that Personally Identifiable Information is any information which can be used on its own, or together with other information, to locate or contact an individual.

That means, for instance, that an Employee ID (which is pretty easy to scan for) is not PII on its own, but if there’s a spreadsheet where IDs are mapped to names, and another document where it’s mapped to an address, that is PII.

So the tricky aspect to locating PII isn’t just finding an easily defined ID number, it’s any name, organisation, address, phone number, email, and so on, that could tie to the individual.

Why Should Companies Care about PII?

For starters; this isn’t optional. All organisations are subject to some form of regulation surrounding the treatment of personal information. In many cases they’re subject to regulations from many countries. What’s more; as of 25 May 2018, the GDPR comes into effect and the penalties for non-compliance are severe. Any company holding data on EU citizens needs to be compliant before then – that’s even if they’re based outside Europe.

Waiting for Brexit instead? Experts agree; British companies seeking to work with organisations in the EU will still need to comply with EU legislation and regulations. Companies that do not comply face exclusion from working with EU organisations and citizens, and -by extension- other UK companies who operate internationally.

Finally, with increased public awareness, companies which allow customer or employee data to be lost face impacts to share price, sales figures, client and partner reputation and more.

What makes it so hard?

  1. Varying Requirements. Different countries have completely different requirements with regards to how this data should be treated. Some require disposal after a period of time whereas others don’t.
  2. Additional Complications. Some regulations are extra-jurisdictional (for example; if a US company holds information on EU citizens, it will still need to comply with the General Data Protection Regulation, or GDPR).
  3. Unknown Information. Even if you can navigate the jurisdictional maze, understanding what data you have and how it should be treated can be a tremendous task. Companies largely don’t know what they have – how do you look for what you don’t know?
  4. It’s Mostly Unstructured. Private information is not just kept in nicely organised well-protected databases. A lot of it resides in documents, spreadsheets and log files stored on email servers and file shares throughout the organisation.

Corporate Policies Not Connected to Data

Most companies will have privacy policies. They typically place a level of responsibility on senior staff to ensure those policies are enacted. These people annually sign off on their department being ‘compliant’ with company policy, because, what else would they do? Less cynically perhaps, they have no evidence to the contrary.

However, having the policy (normally available as a brief PDF document) doesn’t magically make it happen. Hoping that you don’t suffer a data leak won’t prevent it. With the best will in the world, most companies simply don’t have the infrastructure to enact these policies.

There’s a huge disconnect between corporate policy and the level where the information is used/stored.

Often the responsibility is simply delegated to employees who are told to be compliant. However, they can’t track all the personal data left lying in unstructured systems such as emails and file shares, and they can’t be expected to be experts on what constitutes privacy data, and what the policy is that should apply to it.

Mostly people want to get on with their jobs, often under strict deadlines, and the easiest way to do that is not to take the extra time to review and appropriately secure all documents they deal with.

Retrospective Searches Not Good Enough

There is often a view that whenever the company enters a dispute with an individual, it can do retrospective searches to find the relevant data and then deal with the matter of privacy data.

In some companies (who shall not be named but include quite a few of the top EU/US financial institutions) I’ve seen collections that consist simply of emails asking “Does anyone have any information on [X]?”. That’s not a defensible collection process. In fact it’s probably about as far from it as you can get.
Contrary to what this approach suggests, privacy protection isn’t just about the famous “Right to be forgotten”. It’s about preventing personal information being used incorrectly. That means removing CV’s floating around mailboxes, storing employee addresses securely, preventing client details being leaked, and much more.

Ultimately; when data is leaked from your company, will it be encrypted or will it be an open spreadsheet with all your board members’ credit card details? (Yes, I’ve found these, unsecured, on file shares).

This needs a proactive approach.

Proactive Security Should be the First Line of Defence

PII needs to be secured proactively. It’s no good deciding to do it after the data has left the organisation.

Endpoint Protection is a popular approach to preventing data leaks, however the fact is that it’s only one aspect to preventing data leakage, and doesn’t protect against more determined and inventive employees, nor does it cover regulatory responsibilities. It’s the last line of defence, but it should never also be the first.
To properly protect PII, the key lies in proactively finding it, and suitably applying security and retention policy.

Pragmatic Approach

Some like to suggest that this should be part of a “holistic” approach where companies should go through all their systems and data to find PII, and use that to reduce cost and risk while improving productivity.

It’s a great vision but wildly impractical. I’ve so far only known one company (one of the major banks) to embark on such a program. So far it hasn’t been successful. Not because it can’t be done, but simply because it’s too big; with too many stakeholders, differing priorities and shifting requirements.

Better to take a pragmatic approach.

  1. Do an Assessment. Have an independent Information Governance assessment done on your organisation; this can be high-level or in-depth, globally or within one department, and will give you an objective view on where your priorities might lie. This also help you measure any improvements down the line.
  2. Focus. Decide where you can get the best results for your investment and/or where your highest risks lie.
  3. Make it a project. Many vendors will tell you that some product will solve all your problems. Whilst the correct software is important, this endeavour can only be successful if it’s run by someone who can manage communications, expectations, negotiations, and more.
  4. Iterate and expand. Apply your project on a small, manageable scale. This will help your project team build the necessary business processes to ensure success. Once these are in place it’s easier to extend the program to other parts of the organisation with minimal project risks.


Gravicus provides a fully standardised Information Governance Assessment which takes into account many factors and results in an objective score with a full analysis to help you easily determine the best place to start looking for PII.

This also enables you to measure the success of your program by giving you a ‘before’ score to compare to the state after the program.


Gravicus has the Osprey Platform which is a revolutionary new system for finding PII and performing many other Information Governance practices.

We incorporate Natural Language Processing (NLP) and Pattern Recognition to find occurrences of Names, Places, Organisations, Addresses, Keywords, Credit Card Numbers and lots more, and use advanced visualisation to find where the risk documents are based on where the system has found correlations.

Why use NLP? No need to use predefined lists of words to look for! It’s Gravicus AI reading the documents for sensitive information and tagging that onto each document for analysis.

Once located, simply use DataLines to automatically move and/or secure documents as required. This can be entirely automated for larger datasets.


Software itself isn’t enough. The suggestion that “install this and all your problems go away” is unrealistic. This kind of effort needs to be lead by an underlying program which communicates with the business, runs the interviews for the data mapping, ensures proper documentation every step of the way for the regulators, and so on.

A proper software solution will facilitate this process and ensure that whatever steps you need to take towards compliance can be met. However, someone still needs to be in charge of that process.

If you have a team that can manage the program, that’s great – we love it when people are prepared! But if you need a hand, we can provide expertise in this area. We can even wrap the whole program up as a service, working closely with your Business, Records, Legal and IT teams to make sure everything is taken care of according to your own requirements.

Just drop us a line.