1 – You can’t get hold of the critters…
Although some organisations have been diligent in centralising all PST files onto file shares, working with PSTs over a network link is, in fact, unsupported by Microsoft.
By default, PST get written to the local drive, but savvy users can create and copy PSTs anywhere on the ‘c:’ drive, other partitioned drives, USB sticks, and external hard drives.
This means your first and most daunting challenge is actually tracking down PST files.
Then, even if you can locate them, getting a long-enough time-window to do anything about them can be tricky, especially in organisations that have a lot of laptop users that connect to the network intermittently.
Added to this is the fact that PSTs are designed to be accessed by just one application at a time, which means that if the user has Outlook open, any other background application designed to capture PSTs can’t usually access them.
In the face of such adversity, sometimes a technical approach can’t beat a pragmatic approach. For example, a large pharmaceutical company we know ran a ‘PST collection day’ at their corporate team-building events, where road warriors were asked to drop their laptops in a tent on arrival for ‘frisking’.
This might work for some organisations, however, with the proper tooling to collect PSTs, these days are long gone.
A tiered architecture, ideally one where the elements requiring the most computing power are hosted in the cloud (such as Microsoft Azure), is going to be the way to go for those companies that fear the size and topology of their network and highly distributed aspect of their workforce will be a barrier to cleaning up PSTs.
With the heftier parts of the architecture hosted elsewhere, it only leaves minimal software footprint ‘Agents’ to be deployed to the locations that need scanning.
2 – Migrating PSTs: Too much effort…..for what reward?
Although the technical benefits of getting rid of PSTs might be obvious to the IT manager, the payback of rounding up PSTs to the business might not be so apparent to ‘the board’, leaving a migration project low down in the pecking order of priorities.
The difficulty is that PSTs are like locked personal filing cabinets, which means the information risk they pose to the business is difficult to quantify and therefore easily ignored.
What you don’t know doesn’t hurt you, right?
It’s highly likely that your enterprise PSTs contain very little information that is of merit or interest to the company. A large percentage of content may be personal: jokes, junk mail, images and sound files, etc.
The prospect of uploading what is essentially rubbish into the corporate email system, a dedicated archive or perhaps even a hosted email service is anathema to many enterprises. Apart from consuming large amounts of storage, the process could take too long and cost too much.
On the flip side, PSTs may well constitute vital business records such as sensitive emails, contact lists, confidential reports, personal data, budgets and financial results. They may even contain content that would be relevant to an eDiscovery case or would be covered under GDPR legislation.
In fact, PSTs are a convenient way to covertly extract emails, contacts and calendar entries out of the corporate email system…arguably this in itself is good reason to want to get them under control.
You might want to check out this ‘life hacker’ article and in particular the replies to it, to understand some of the issues at stake when users start extracting emails into PST files.
PST risks in the healthcare sector
The reality is that to get the business to focus on the PST issue, you need some way of quantifying the potential risk posed to the business by the content of PSTs. Only then can an informed decision be made and a project justified.
3 – It will kill our network!
If you thought organisations would have cleared up their ‘PST act’ by now, you’d be mistaken.
The drive for better data governance (e.g. GDPR) and the decommissioning of on-prem infrastructure that goes hand-in-hand with the shift to the Cloud has become the tipping point for some of the largest organisations to get a handle on their PSTs. From a data governance perspective, consider that the data contained within these PST files is actually an unknown value to the organisation as it is typically dispersed data.
We’re just about to start a project that runs into several hundreds of TBs of PSTs and we’d be kidding ourselves if we thought it was going to be anything less than challenging.
On a call with the Microsoft Windows team last week they seemed surprised when we said such a project could take well over a year to complete in full.
I guess Microsoft’s experience with PSTs to date has been to use drive shipping or a network upload, where the customer or a services partner has already done the legwork to corral the PSTs into a central location. What we don’t know is how long it took that customer to find all of the data and centralize it.
The reality is that unless PSTs are stored on network file shares (which, by the way, is not a Microsoft-supported model), PST data will need to be transferred from potentially 1,000’s of individual users workstations scattered over a wide area network. Added to this, they will need to be moved whilst users are connected to the network (i.e. during working hours, where network demand is the heaviest).
So how can you avoid killing your network when gathering PSTs? And how can you shrink the overall project?
One thing we’ve discovered over the years is that by single-instancing organisations can massively reduce the amount of data they need to pull across the network.
PST files often get copied by end users and backup services to other locations on their machines. This means you are unlikely to have two PSTs that are the same so there’s no ‘gain’ In excluding duplicate PSTs– you need to drill into the individual PST contents.
Our single instancing algorithm works at an individual email level and can either exclude or include the folder path in assessing email uniqueness. This is handy as a user could have two PST files where a percentage of the data from the first PST file has been copied in a second PST file, but under a different root folder.
The upshot of de-duplication from a network traffic perspective is that once we’ve encountered one instance of a given email, we don’t need to pull another copy across the network.
Also by culling emails based on a range of criteria to meet business and legislative requirements – date-based being a classic example, or size based if moving to Office 365 – you can significantly reduce the amount of data to be transferred across the network.
In this PST e-Discovery case study the amount of data that was pulled across the network was (10 x less than) just 10% of the total PST volume (and subsequently took a fraction of the time).