Data Discovery done right

One of most common mistakes I see people make when they migrate IT systems is simply dumping all the old information from source to destination.

Migrating is the perfect opportunity for businesses to look at the data they have and basically do a ‘clean up’. Think of what happens when you physically move locations. You look at what you have, you throw some stuff away and your reorganise what you have into the new location. Rarely, if ever, do you take the contents of each room and dump them into exactly the same room at the new location. Moving house is an opportunity most people take to ‘clean up’. Why are you therefore not taking the same opportunity when it comes to migrating IT systems?

The system that I like to use is to divide digital data up into 4 logical segments. Now, in my experience a good rule of thumb here is that you can divide data up into four major categories, each of which will have a different action performed on it.

Firstly, there will be data you need to delete. By delete I mean erased from existence. Just because you can stick it on a USB thumb drive doesn’t mean you should. Data that should be deleted is typically duplicated information, large images or videos and stuff that is no longer relevant. In my case, keeping information about Office 365 from over 3 years ago makes little sense as the product is completely different. Thus, it should be deleted.

The next segment of data to consider is stuff that is still relevant and should be archived. In most cases archived data is required for compliance. Here in Australia, the typical compliance time frame is 7 years. Thus, data beyond 7 years can probably go into the ‘deleted’ bucket. Archive data is stuff a business wants to keep to refer back to or perhaps base new material on. If part of the migration of the IT systems is moving to the cloud then there are two options when it comes to dealing with data to be archived.

Firstly, any archived data can be done so on-premise and not moved to the cloud. Typically, this means moving it to a USB Hard disk or perhaps a local server or workstation. When, and if required, the device is hauled out, connected up and data accessed as required.

The other option with data to be archived is that it can be moved to somewhere like a dedicated SharePoint Team Site. The advantages of doing this are that the data can be marked as read only but is then indexed by Office 365. Indexing information makes it available to the business simply via the many search mechanisms in Office 365. The downside of moving archived data to the cloud? It has to be uploaded. If there is a lot of data that may take a while but once it is there it becomes lot more useful in my books that it would if it remained on premises. The other things about moving data to be archived is that the structure is not altered, it is moved ‘as is’.

With deleted and archived data now removed from the source location you are typically left with 50% of the original data. At this point my advice is to continue the migration process from the outsides in. That is you migrate the oldest and the newest data first and I’ll explain why but let’s firstly consider the oldest data.

When you commence shifting the oldest data you’ll find that some of this can also be moved to the archive but everything else should typically just be moved. By moved I mean taken to a new location without making major changes to the structure it is in. This means that if you have a folder of information that is ‘old’ you move it and its contents directly into a new SharePoint Team Site Document Library typically. You do the same with the next oldest source of information.

The reason that you don’t make major changes to the structure of ‘old’ data is that, in theory, it is not being accessed that often and there is no real value to be gained by doing a complete re-structure because it isn’t used that often. Basically, you just want to move it as is because eventually it will end up being archived.

At the end of the spectrum that newest data, or the data that is the most current and being used constantly should be re-structure before being moved. This means that most of the current data won’t end up in the same structure as it is found on the source. The most current data should be moved to where it makes the most sense for the business given the new abilities in the destination. For Office 365, this means that you shouldn’t ‘dump’ you current data into a single Document Library in the default SharePoint Online Team Site. It means you should probably be shifting some data into Microsoft Teams, use data into OneDrive for Business, some to Yammer and so.

The other reason I advocate moving the most current data is around adoption. If your process is to progress from the least current to the most current, then users will not typically be using the advantages that Office 365 provides on the data they work with daily. You really want users to take advantage of everything Office 365 provides them immediately they have access to the system. Thus, you should always restructure the most current data and move it to where ever make sense in Office 365 to give users the immediate benefits.

Thus, in summary, we can categorise the data on a source system as follows:

– Delete = old duplicate and unwanted dated. To be erased

– Archive = data to be kept without changes make to structure

– Moved = active but older data to be moved without changes to structure.

– Restructured = most current data to be moved to new locations that take advantage of the features available.

So, you should never be simply dragging and dropping your data from on premises file servers directly into SharePoint. You need to take the time and clean it up and categorise like shown above. Once complete, you then migrate it to the place that makes the most sense in the new system. Doing so will ensure you get the maximum return for the investment in the new system and optimise the information brought forward. Continuing to accumulate data between systems is simply being lazy and failing to leverage one of your most important business resources.