It has become standard procedure to conduct controlled tests when installing and updating IT systems. Unfortunately, there is not the same level of certainty when it comes to testing in other contexts of handling data. Undetected errors can have significant financial and quality consequences when large amounts of data are moved or cleaned up. Quality assurance and testing efforts should naturally be tailored to risks and severity, but a risk assessment and corresponding tests should always be conducted when handling data.

With digitalization and the escalating amount of data, there is a general understanding of the necessity of data migration and clean-up as a discipline. Most people have an idea of what it entails to define, plan, and execute such projects. However, few can clearly state what it takes to ensure that everything has gone as expected. In this article, we focus on the importance of controlled testing in connection with data clean-up and migration and share our experiences on the subject.


Projects involving clean-up or migration of data and documents are complex endeavours with many stakeholders. Each stakeholder has their own deliverables, which together make up the project as a whole. But how do we know if everything has gone as expected in the end? Have there been unexpected events? Is the quality as expected? And how do we handle the things that may have gone wrong?

These are the questions that a controlled test should provide us with answers to. The test can give us certainty about quality and security, as well as a documented, validated, and verified conclusion on our clean-up and migration.


Controlled testing, in its simplicity, aims to provide an objective, valid assessment of the extent to which data clean-up and/or migration meet our project goals. The final result should be analysed and assessed against the scope, outcome, and quality defined in the project’s initial phase. Finally, the risks identified in the initial risk assessment should be evaluated and managed.

Testing in a structured and controlled manner may seem expensive and time-consuming. However, in our experience, it is even more expensive to detect errors and search for causes and sources randomly and unsystematically. Additionally, testing is important because everyone can make mistakes, and everything can fail. The more critical the data, the higher the cost of poor quality and the consequences of errors.

Regardless of whether structured or unstructured data is cleaned up or migrated, it is important to test the quality. Both categories also contain what can be called “dirty” data. This is data that is subject to various exceptions and does not follow the usual norms and rules. This can particularly occur where free text or user-defined values are allowed. Here, several different names and spellings can appear for company names, countries, etc. In addition to ensuring the quality of the clean-up process, controlled testing will also reveal any errors, deficiencies, and inadequacies in this “dirty” data. This means that an actual enrichment of the data can occur in connection with a clean-up or migration.

Overall, our attention to data and data quality is sharpened when we test. Not only on what we expect to deliver, but also on what is actually delivered.



The different stakeholders in the project (business, technicians, migration specialists, testers, etc.) all have different prerequisites. It is important to establish a common understanding of both the data foundation, the technology, and the systems to be migrated from and to early in the project.

The test responsible should be involved from the start of the project so that they gain a good understanding of the data foundation, and the target state is defined in a good requirement specification. In addition, testers should be continuously involved to ensure that it is clear how data is used, how it is handled in clean-up and migration, and how it should look and be used after the project. Insight into the above is necessary for testers to provide qualified input for the requirement specification and high-quality test planning and execution.

It is a common misconception that testing is only something that is performed at the very end of the project. Planning for the final test begins at the start of the project and acts as an essential support function throughout the project.

2. Test Plan

Good testing requires good planning. With all actors involved in the project, the prerequisites are in place to create a good test plan. The plan should include a description of the entire testing process.

As mentioned, the testing process begins at the start of the project. When assessing the quality of the cleaned-up or migrated data, one must be able to compare it with the original source data. Therefore, a test scope must be defined, which delimits which data to use in the test. It can be the full dataset or part of it, but it is important that it is comprehensive and representative of the source data. Similarly, we must define in advance how we will select and collect corresponding destination data from the processed data.

In addition to having control over the input and output scope and how we collect it, it must also be described how we will trace the transformations that occur on the data. During the clean-up and migration, data will be moved around, and metadata will be cleaned and enriched. This transformation is described in a specification and must be taken into account in a test. It goes without saying that in addition to testing that the data is complete and valid, one must ensure that the manipulations performed on the data are executed correctly.

3. Execution

The actual test should be carried out in a controlled and reproducible manner, so that the result of the test can be used to go back and repeat the transformation and correct any errors. These errors could be actual errors, or they could be changes based on insights gained from the test.

Explicit test scripts should be created that can be executed meticulously – possibly automated, where relevant – in one or more runs.

4. Test Report

When all test scripts have been run and any errors or inefficiencies have been corrected, the project should be concluded with a report. It is important to consider all elements of the entire testing process in this report, including the definition of scope, collection of test data, transformations, deviations, and, of course, the final test result.

To claim full traceability in the clean-up and migration, it is not enough to document the scope and transformations. It is just as important to be able to document that everything actually went as specified in the documentation. This will be very valuable documentation in any dispute over the validity of documents or data.

The test report is therefore where the project is wrapped up and given the green light as a successful completion.