GUIDE TO DOCUMENT MIGRATION

New system! Now we just need to get the documents transferred from the old system.
It’s just not as easy as one would think… The project gets delayed, users can’t find the documents, data is lost, etc.
Hopefully this guide can help you avoid just that.

1. INTRODUCTION

This guide covers the migration of documents. Migrating documents differs from migrating (structured) data in several ways. These differences are often the culprits leading to misjudgements about an impending document migration. Here, we go through and review a migration from A to Z.   

1.1. ALL SECTORS – INCLUDING THE REGULATED ONES 

The experiences and methodologies shared in this guide largely stem from the heavily regulated pharmaceutical industry. However, we will explain our methods in broadly applicable terms, noting only in comments, where there are particularities in regulated industries. Towards the end, we will finish off with a section specific for regulated industries.

1.2. DIFFERENT REASONS FOR DOCUMENT MIGRATION

Documents are migrated for three main reasons:

  • A new system is being implemented and the content from the old one needs to be loaded into the new one. We call this (system) migration.  
  • A large delivery of documents from a supplier or partner needs to be loaded into the local solution. We call this import.
  • Documents from a running solution must be moved into the archive. We call this archiving.   

The guide focuses on what we call (system) migration, but much of what is described below will be equally valid for the other two situations.

2. TABLE OF CONTENTS

  1. INTRODUCTION
  2. TABLE OF CONTENTS
  3. NEW SYSTEM – APPENDIX MIGRATION
  4. BEFORE YOU START
  5. PHASES OF THE MIGRATION
  6. MIGRATION-CENTER
  7. LIFE SCIENCE
  8. CONCLUSION

3. NEW SYSTEM – APPENDIX MIGRATION

The new system is on its way. Business users have been involved in requesting, designing, and setting up the new solution and are looking forward to putting it into use. The focus has been on business processes, functionality, and ease of use. A natural, and very important, focus. 

3.1. HISTORY

However, there is also the documents and the document history in the old system. Some documents may need to be archived; others discarded in a controlled manner. What has to be migrated to the new system must be identified. It is rarely wise nor correct to assume that everything must be migrated. There is a separate project to manage the documents that are not migrated, but we will cover those considerations elsewhere. The focus of this article is on the documents eligible for migration. How to get them migrated in a way that they can be found, used and versioned in the new system.

3.2. USER ACEPTANCE OF THE NEW SYSTEM

It is essential for the user acceptance of the new system that the documents are migrated in a way that does the new system justice. We have seen unfortunate cases where otherwise excellent systems are simply declared unusable by users because they cannot find and use their documents after migration. This should be avoided for obvious reasons.

3.3. NEW SYSTEM – ALL INCLUSIVE

The supplier of the new system often offeres a good deal on migration with a promise to import documentation right before go-live. They just need you to provide your documents and metadata in a particular specified format. So now we have a plan. Or do we?

It seems very approachable that the vendor only needs to be handed a bunch of files and some spreadsheets right before go-live. In rare cases, that is in fact all there is to it. However, most often it is not. Sometimes it is much worse. 
We’re not out to vendor-bash. The vendors offer exactly what they need to, and they will very often be the best at getting the documents safely into the system. Although, if it’s a well-known standard system, it might not be so important to have the vendor involved.

However, we want to focus on what appears to be an easy task – “just delivering” document files and metadata. Many are deceived by this, and the consequences can be huge. Let’s start at the beginning.

4. BEFORE YOU START

4.1. UNDERSTAND YOUR MIGRATION

Migration can be problematic if it is underestimated and taken too lightly. It is therefore necessary, first and foremost, to understand the migration that one is facing. Below are some brief points on that, as well as references to other articles that go into more depth on the individual aspects and claims. 

We hope that the material will inspire you to get started on further understanding your own migration projects. 

Recommended articles:

1.1.1. THE SHORT VERSION

Document migration has some aspects that are particularly complex, and these are largely related to the difference between document- and data-migration, namely the fact that a document consists of several parts: both the content and the metadata. 

Some of this metadata is merely descriptive and used for retrieval. Other metadata must be considered an integral part of the document, for example, proving that the document is authentic. Additionally, documents, as a type of metadata, also have relations to other documents, other versions, other formats, other files on the same case, etc. 

Documents thus have two parts and a lot of relations, and these must be handled without losing the links. Some of the articles referenced above explain how documents can lose their usability and integrity – and thus, for example, legal validity – if this handling is done incorrectly. On top of that, there is the complexity that comes from the fact that there are often very large volumes of documents to be handled. 

The result is that we are faced with a critical and complex operation that is all too easily underestimated.

4.2. VALUE-ADDING MIGRATION 

The historical documentation does not attract the same attention as the exciting new system. Understandable, but a pity. It is a fantastic opportunity to clean up and get high-quality data in the new system. In turn reflect positively on the user experience and value system.  

Even if the old system has been used with care, it often turns out – we dare almost say “always” – that there is still something that is not as one thought in the data. Perhaps a significant number of documents are archived under “miscellaneous”. Perhaps documents with no actual purpose or value have been filed. Perhaps the owner of the documents is long gone, etc. Some of these things are minor, but typically the volume of such things is overwhelming, and it makes the job of delivering the spreadsheet and files to the system vendor grow out of hand.

If we look beyond the new system and beyond the organisation, improving the quality of documentation is something that contributes to efficiency, quality, and compliance in general. 

Migration can be seen as an annoying additional cost of implementing a new system. On the other hand, it might be welcomed as a value-adding activity. If it is to create value, the right skills must be brought to the job and the work must be taken seriously. But this does not have to make it more expensive.

4.3. SET THE RIGHT TEAM

Although there is a lot of heavy IT involved in a migration, it should not be regarded as an IT exercise. First and foremost, it is a business exercise. Where is created by addressing documents and data in a completely systematic way, and by cleansing and enriching them in the process. 

A migration needs a wide range of knowledge: 

  • Technical knowledge of the old system and the new system. 
  • Business understanding of the use of the old system and intended use of the new system 
  • Knowledge of the organisation’s master data 
  • Knowledge of the rules and regulations the company is subject to (especially for highly regulated companies) 
  • Document management knowledge 

In particular, we would like to point out that many organisations have actually a trained archivist, records manager or information specialist or similar somewhere in the organisation. This is a skill that could be particularly usefully to include in the migration process. Unlike the rest of the team, this person will typically be interested in the historical documentation, the traceability, and critical metadata – and that skill is really needed. 

This is where value is created.

5. PHASES OF THE MIGRATION

The reality of migration is a lot of trial and error and unexpected surprises in the data. We can’t ignore that reality, but we can decide to handle it. We can divide the migration into phases to give it structure. We think the structure is a great help, but it should be used with in respect of this reality. For instance, something is bound to come up that needs to be analysed, even if the analysis phase is long over, etc.  

Our recommended structure consists of a pre-analysis and a choice of methods and technologies based on the results of the pre-analysis. The roundabout illustrates that the analysis, configuration and testing of the methodology and technology chosen is something that takes place iteratively. When testing is satisfactory, the more formal process of testing continues and, if successful, the migration is ready to proceed. Finally, reporting follows. 

In the following sections, each phase will be discussed.

5.1. PRE-ANALYSIS

The purpose of the pre-analysis is to identify the overall requirements for the migration, to make informed choices about technology and methodology.

A pre-analysis can be very long as there is always more to analyse. It is important to set clear goals and stop when you know enough to make a decision.

Typical issues to be addressed are: 

  • Approximate scope (which documents and data) 
  • Overall technical differences between source and target systems 
  • Overall structural differences between source and target systems
  • The data quality in the source system 
  • Regulatory requirements 
  • Target system implementation strategy (e.g., incremental, or big bang)  
  • Business requirements e.g., timing, possible decommissioning etc. 
  • Financial, resource or time constraints

5.2. METHOD AND TECHNOLOGY SELECTION

Based on the pre-analysis, a decision is made on how to migrate. Three main questions need to be answered.

  • Technology:
    Do you use a migration tool, homemade scripts or is it “dump and load”?
  • Segmentation:
    Should the migration be “big bang” or phased, and if so, how? 
  • Quality assurance:
    How will we assure the quality of the migration (in life science= define our validation approach) and how will we report on the migration?
5.2.1. TECHNOLOGY

In the context of the technology choice, we have the following advice and experiences to share: 

If the two systems are structurally similar, e.g., the new system is a newer version of the old system, it may be possible to detach/dump the entire database and attach/load it in the new system. If so, the migration task is purely technical and some of the above-mentioned concerns about integrity and the right skills in the project can dismissed. 

A migration is basically an export from the source and import into the target system. Often the supplier of the new system has offered to import to the new system. If the source system has a sufficient export option, and the output can be used by the import feature, this approach could be the right solution. 

Often, however, there are differences in the metadata the new system and the old system rely on. This means that metadata must be transformed in the process. There may also very well be a need for enrichment or clean-up during the migration. If this is the case, we recommend using a commercial tool designed to handle metadata mapping and metadata value transformations. The IT department may offer to develop something, but this option is seldomly worthwhile compared to buying or leasing an existing tool. 

Depending on the platforms to be migrated to and from, there is a variety of tools on the market. We have a favourite that we like to rely on unless circumstances dictate otherwise. It is the migration-center tool from the German company fme AG. In many cases, this tool is a really good choice. It has a wide range of technologies it can connect to out of the box. In addition, it has all the functionality we have needed so far in terms of mapping, transformation, and reporting. Artificial intelligence for classification is starting to appear in the tool as a possibility. They are not frontrunners, but rather exercise deliberate caution, to maintain the tool’s core capability and document full control throughout the migration.

5.2.2. SEGMENTATION OF THE MIGRATION

There are physical limits to how much data can be moved at a given time. We see that sometimes it is not practically possible to shut down on a Friday night, migrate over the weekend and be ready Monday morning. This is however very often requested. In these cases, the migration has to be broken down into smaller chunks.

If users are onboarding to the new system in phases, and data should be migrated accordingly, there is need for a segmentation of the migration. 

It is also often decided not to migrate documents from in-progress projects. This is left in the old system until the project is finished. This means that a catch-up migration must be performed after a period. Only then, can the old system be closed completely. 

In short, you very often end up having to migrate in stages, adding to the complexity of the project. You need to keep track of what has been migrated, deal with new documents, and documents that might have been accidentally edited in the old system after the initial migration. 

It gets even worse if you have to allow documents to potentially be modified in both systems by two different user groups and these need to be synchronised. This should be avoided, if at all possible, because the complexity becomes insanely high.  

The thing to decide, is whether to sub-divide the migration into smaller chunks or go for the big bang. Subdividing is usually a good indicator that a commercial migration tool is needed.

5.2.3. QUALITY ASSURANCE

If the documents you are dealing with are records, then migration is a critical action. Here we leave the assertion, but the articles referenced in the “understand your migration” section above explain and justify the assertion. 

This means that we need to have an apparatus in place that ensures – and can document – that records are properly migrated. There are two tracks to this: 

  • One is to ensure that the method and technology are tested and work correctly 
  • The second is to quality assure and document the migration execution  

If the technology is homemade, the testing work will obviously be quite extensive.

5.3. ANALYSIS, CONFIGURATION AND TESTING

This phase consists of continuously examining, testing and improving until you get the result you want.

  1. The first thing to work on is to define the subset of the source that you want to migrate. This could be all documents of a certain type, from a certain department, in a certain state, related to certain products or cases, or many other things. 
  2. The next thing to manage is classification and mapping. This covers how objects in one system and the other system fit together. Often the systems operate with some basic types of documents, and these are the ones we need to match between the new and the old system. In the old system we might logically split our system by the type of report, while in the new system we logically split by the type of product the report describes.
  3. Next, focus on the metadata of each of the above classes or types. For example, in the old system there was a “title” on a report, whereas in the new system it is “name”. The value in title simply needs to be mapped to name so that during import the value is put into name. 
    But these can be – and typically are – much more complicated rules and transformations. Sometimes the rule, when encountering the document metadata, does not give the expected result. The rule may of course be wrong, but often it turns out that there is number of documents that are different than expected. Typically, this leads to the rule having to be adjusted to take into account the particularities of a subset.
     
  4. Finally, we have the values. In many systems there is a list of allowed values for certain fields. Days of the week, months, countries, product numbers, client names, etc. This means that the values you migrate into that field should match the allowed ones. 
    This may lead to the need to map the values in each value list (trivial example: ‘Mandag’ should become ‘Monday’, ‘Tirsdag’ to ‘Tuesday’, etc.) The result of this phase is that these rules and mappings work for the subset(s) you have been working with.  
    In the tool we often use, there is an option to simulate a migration. This is a really nice thing. It can collect all the information from the source system and experiment with it, without disturbing the source system in the process. Once you set up your rules, you can impose them on all documents in the subset you are working with and see the result. Did it behave as expected? And without revealing a big secret, I can tell that the answer is “no” the first many attempts. At some point you need to have the results checked by others who may have more knowledge of the content of the documents, and then it’s convenient to be able to export your simulations into spreadsheets and send the out for review. At some point you are satisfied, and you can go on to prepare another group of documents, or you can go on and get ready to import these documents into the receiving system. 
  5. Almost there. We must now simply consider whether we need a roll back strategy. For example, should a script be constructed that removes and cleans up after a failed migration? The technology we often use creates a roll back script itself.

5.4. FORMAL TEST/PILOT

So far, we have tested and checked the result. Now it’s time for a formal test that leads to us being ready to perform the migration. What a formal test should then contain is industry-dependent and very dependent on technology and methodology choices.  

It would be customary to do a pilot migration, where a few but representative documents are migrated in a quantity, so that you can go over them in detail and make sure that what you hope is happening is happening. And it typically shouldn’t be the technicians and business analysts who have been part of the migration who are doing the testing here. It should be done by the business users. It can be thought of as a user acceptance test of the outcome of the migration – much as one typically does such an acceptance test of the system itself.

In the life science industry, one should expect having to qualify the technology and methodology and subsequently validate the migration. Here we are at the qualification stage. Strator has templates ready for both IQ, OQ and PQ/UAT, which can be adapted to the circumstances and your QMS.

The result of this – regardless of industry and form – is of course that we dare to start our migration.  

5.5. MIGRATION

It is now time to launch the migration itself. This could be a big bang migration, where we close the old system for good on Friday night and open the new one on Monday morning, and in the meantime the migration is done. 

More often, the migration consists of smaller migrations, as indicated above. Regardless, you will often have a closing window, and during this get (the last parts of) the migration done and close the old system or parts of the old system. 

So, in short, this phase is about doing what we have been practicing – migrating the documents and testing that it went well. 

In some situations, you need to have a tight schedule running and have worked out when it is the last chance for a rollback. It is understandable that when problems occur, you keep trying to solve problems and don’t focus on anything else. That’s when you need a well-calculated schedule that ensures you stop in time to roll back. There’s nothing that can be said in general about that, other than to be aware of whether it might be a problem in your case. 

There may also be some mandatory steps in the execution from a quality assurance point of view. For example, you will typically require the migration to be verified (tested) and approved before allowing you to open for access to users on the new system.

One must also be prepared to deal with the fact that there will be failing documents. Something odd will always happen, and it is essential to be clear in advance of what you are going to do about it. Often you will collect them in a list and end the migration with a deficiency list, which you then deal with afterwards, but there may be reasons why this is not a good practice.

5.6. REPORTING

After migration, it is good practice to collect and archive documentation of the migration’s completion. In many contexts this will be an explicit requirement.

The documentation may include a log from the technology used. Ideally, this documents what transformations the data has undergone along the way and shows that the content files in the receiving system are identical to those in the source system. 

Documentation of the verification of the final migration, and documentation from the phase where we formally tested the system, will both constitute key evidence that the migration is in order.

All this documentation should be stored. Indeed, it constitutes the proof that the documents have maintained their integrity during the migration.

6. MIGRATION-CENTER

As mentioned earlier in this guide, we have a tool that we use/suggest when relevant. It is called migration-center and is owned by the German consulting company fme AG. 

Like us in Strator, fme AG has historically worked a lot with the large document platform Documentum, and the migration tool was originally developed to migrate documents into Documentum, and as such we in Strator have known it for around 15 years.

It is not a bad heritage to have. Documentum is a very large platform that supports very complex relationships between documents, so when the tool can handle that complexity, there’s not much on the market it cannot support. In addition, Documentum has historically been widely used in the life science industry, where the tool also can meet strict regulatory requirements at its core.

The tool has now been generalised and the migration engine at the core of the tool can now be used with different connections between source and destination, the so-called migration “migration paths”.   

6.1. STRUCTURE OF THE TOOL

The tool is – for now – a traditional client-server tool, designed to be installed inside the customer’s firewall. For now, the source of a migration is usually on-premises (i.e., not cloud), and so you get the best speed from having the migration host alongside. 

The tool is operated from a client, and behind it runs a migration server and a database.  

The tool works from the operator point of view as described below.

6.1.1. SCANNING
  • The source is scanned for documents  
  • All available metadata for the documents is retrieved into the migration database from the source 
  • You can also choose to download the document file itself into the migration tool’s file system
6.1.2. PROCESSING
  • A subset of the documents is selected 
  • For that subset, metadata is mapped, and rules are set up for transformations 
  • A transformation is performed, which means that the migration-centre simulates the migration for the selected subset and presents the result  
  • A validation is performed, which means that the migration-center technically checks the calculated values. It will check that the data type is correct, e.g., that a data field contains a date.  
  • One keeps correcting rules and mapping, transforming and validating until you have a satisfactory dataset.  
  • Now it will make sense to get business experts involved and check that they are getting what they expect.  
  • One works with and prepares a subset of the documents at a time – subsets that make it simplest to set up rules for the whole subset.
6.1.3. IMPORT
  • After countless simulations, the confidence is now there to start importing to the destination system. According to the phasing we presented earlier in the guide, here you will perform the formal test before proceeding to initiate the import into the production system.  
  • Once the import is started, there are basically two options. One is to import the data that was loaded during the scan. The other is to ask the system to perform a new scan, retrieve the documents in their current state and process them according to the rules you have set up. The difference is whether you need and want the latest.
  • During the import, all transactions will be logged, and the log can be pulled up afterwards and formatted into a migration report.

6.2. GATE MODEL

There is one more subtlety to the tool that is worth highlighting. It is built on a gate model. This means that there is a strip of states a document can be in inside the migration tool. For example, it can be transformed, validated, or imported, and it is impossible for a document to be validated before it is transformed, or imported before it is transformed and validated.  

That is, the migration centre keeps track of every document for us, if it is ready to be imported, and how the import went. When a document is failed, migration-center knows it, and if you fix something in a rule and run your migration again, it will try to migrate that document again – but it skips migrating the others that are already migrated successfully. It’s a tremendous relief and support in the practical execution of the migration.    

6.3. COMPLEX AND LARGE TOOL

The migration-center is not a tool you learn to use in a day. It can do a lot and is therefore complex. But with a little assistance and training from people like us, you can become capable of performing your own migrations if you want to.  

Now, we are not in the business of quoting prices for other companies’ tools, but don’t give in to the thought that it is probably very expensive. It was when it was exclusive for Documentum, but for most people the price now is a pleasant surprise. Also, some different flexible licensing models are provided including leasing during the project period.

7. LIFE SCIENCE

This section is specific to handling of GxP-critical material. So of course, it requires special care. Everything we have discussed above applies here too – or perhaps the wording should be “in particular here”.

In principle, Strator works in all industries, but in practice the life science industry is our home turf. It is in life science that our deep expertise is particularly needed, and we are used to dealing with the regulated environment.  

The migration-center tool, which we find most useful, has been used in many large pharmaceutical companies. The supplier – fme AG – is open to auditing and can provide the SOP for the development of the tool on request.  

In other words, it is a tool and a supplier suited to the pharmaceutical industry.  

Functionally, the tool is highly suitable for life science. We would like to highlight the thorough logging. Quite simply, there is full traceability at document level. That is about as far as it goes. 

Strator has then developed, on top of – or around – the tool, a process including a package of documentation which can be used as a basis for the documentation required by your QMS. We have various practical templates we use in requirements gathering and specifying migration, we have an “operation handbook” for the migration tool, as well as IQ and OQ test cases in stock from previous work, etc. and we are not afraid to share and reuse. 

8. CONCLUSION

Migration is a discipline that is often underestimated at the expense of data quality or a possible new system. Migration is in many people’s perspective “straight forward” and “something IT can fix”. The reality is that this is not often the case. If this guide has helped you see the migration you are facing more clearly and anticipate where problems may lie hidden, then it has served its purpose.

8.1. THE EXPERIENCE BEHIND

The consultants at Strator have worked as document management specialists with a technical or process perspective in our current and previous jobs. Many with 10-20 years of experience within the area. Time and again, we have seen migrations being mismanaged. 

Eventually, we agreed to put migration on a formula. We gathered all our experience and created a model that we work from and constantly improve. We reached out to fme AG, who many of us knew from our shared past with the Documentum system, and were allowed to use their tool, which finely backs up our need for full control throughout the process.

8.2. WANT TO KNOW MORE

This guide is our attempt to share our experiences and methods. The guide is by no means exhaustive, but hopefully helpful to you. 

The knowledge section on our website contains several articles about the inconveniences of migration and other operations on documents. 

Because that’s what we do: Our specialty is managing, organizing, classifying, and migrating large volumes of electronic documents.

Please visit our knowledge section to learn more and feel free to reach out and get in touch – your questions are welcome.