Migrate the Entire Document
If we are going to migrate digital documents, we need to be very precise about what a document is so that we migrate the document in it’s entirety. The risk of not being precise is that documents lose their integrity during migration.
Migrate Documents or Data
The term unstructured data is often used for digital documents. It means that unlike financial data, for example, the information in the documents is not in tables in databases, but is embedded in text, and the text is in files such as a Word file. Most people will therefore immediately see that moving structured data and unstructured data is quite different. In the first case you move data from one database table to another, while in the second case you move some files.
But there is more to the story. What we colloquially call a document – a Word file, for example – is content. A document’s content can be anything; sound, images or video, even text messages or tweets, etc., and it can consist of several components (files)..
But when we talk about migration of digital documents, it is usually from one document management system to another, and in such a system, besides the content file, metadata is also stored. Metadata can describe the document, describe the context in which the document was created, document the history of the document and much more. Metadata is also used to retrieve documents but has a different and deeper function, which we will look at in more detail in the next section.
We are now able to specify what a document is:
A document consists of both content and associated metadata.
This then means that when moving documents, there are both files and metadata to move. A document migration is thus fundamentally different from a “normal” data migration in that, in addition to the migration of metadata that is structured, it also has content file(s) – and these two components have a context that must be preserved.
Many people call document migration data migration. You can call it what you like – we just call it migration. Just remember that when it’s documents you’re migrating, there are BOTH files AND metadata to migrate.
The Importance of Metadata
Metadata is what helps us retrieve documents in the new document management system. From a user perspective, it is incredibly important that metadata is good, consistent, and descriptive of the business area you are now dealing with.
That is incredibly important, but that is not the point we are trying to make. We want to look at some other types of metadata. There is metadata on all documents that helps to give the document context and therefore authenticity. This requires further explanation:
When a document is created in a system, most systems automatically create metadata such as creation date and creator. This is usually not tampered with, and the timestamp can thus help prove that the document was created by this person at that time. It thereby helps to give the document authenticity.
There are many other examples of metadata that have a similarly central role in formally contextualizing the document. Should the document lose this data during migration, or if it is migrated in such a way that it is questionable afterwards whether this data has been tampered with, then the document no longer has the same reliability. It is said that the document has lost its integrity.
In some situations, this is a nuisance – in other situations it is a disaster. It is essential to determine which metadata is important for the integrity of the document and ensure that it is migrated properly. It is also key to be clear about the regulatory conditions you are working under. In some industries, you simply risk losing permissions to operate parts of your business.
Conclusion – And Call
The most basic prerequisite for preserving the integrity of the document in a migration is to know precisely what a document is.
A document thus has two essential components, namely a content file (or files) and associated metadata. The content files must be identified. And so must the metadata.
Among the metadata, we need to identify which metadata are significant for the integrity of the document, so that we can be particularly careful.
Unfortunately, this is something that quite often does not happen before a migration starts. In the worst case, this has the completely disastrous consequence that the documents no longer have any value after the operation.
Our experience and best advice for a controlled migration, where the documents are well afterwards, is gathered in our Guide to Document Migration.