Alleviating the PST Headache: Part 3 of 4 – Transvault

Posted by Liam Neate on Aug 17, 2017

4 Top Reasons PST Migration Projects Fail

2. Too much effort…..for what reward?

3. It will kill our network!

If you thought organizations would have cleared up their ‘PST act’ by now, you’d be mistaken.

The drive for better data governance (e.g. GDPR) and the decommissioning of on-prem infrastructure that goes hand-in-hand with the shift to the Cloud has become the tipping point for some of the largest organizations to get a handle on their PSTs.  From a data governance perspective, consider that the data contained within these PST files is actually an unknown value to the organization as it is typically dispersed data.

We’re just about to start a project that runs into several hundreds of TBs of PSTs and we’d be kidding ourselves if we thought it was going to be anything less than challenging.

On a call with the Microsoft Windows team last week they seemed surprised when we said such a project could take well over a year to complete in full.

I guess Microsoft’s experience with PSTs to date has been to use drive shipping or a network upload, where the customer or a services partner has already done the legwork to corral the PSTs into a central location.  What we don’t know is how long it took that customer to find all of the data and centralize it.

The reality is that unless PSTs are stored on network file shares (which, by the way, is not a Microsoft-supported model), PST data will need to be transferred from potentially 1,000’s of individual users workstations scattered over a wide area network.  Added to this, they will need to be moved whilst users are connected to the network (i.e. during working hours, where network demand is the heaviest).

So how can you avoid killing your network when gathering PSTs?  And how can you shrink the overall project?

One thing we’ve discovered over the years is that by single-instancing organizations can massively reduce the amount of data they need to pull across the network.

PST files often get copied by end users and backup services to other locations on their machines. This means you are unlikely to have two PSTs that are the same so there’s no ‘gain’ In excluding duplicate PSTs– you need to drill into the individual PST contents.

Our single instancing algorithm works at an individual email level and can either exclude or include the folder path in assessing email uniqueness.  This is handy as a user could have two PST files where a percentage of the data from the first PST file has been copied in a second PST file, but under a different root folder.

The upshot of de-duplication from a network traffic perspective is that once we’ve encountered one instance of a given email, we don’t need to pull another copy across the network.

Also by culling emails based on a range of criteria to meet business and legislative requirements – date-based being a classic example, or size based if moving to Office 365 – you can significantly reduce the amount of data to be transferred across the network.

In this case study the amount of data that was pulled across the network was (10 x less than) just 10% of the total PST volume (and subsequently took a fraction of the time).

Coming next:

4. Users think PSTs belong to them

Stay tuned…