Migration and Conversion – A Simple Approach


  • Larry Davis
  • Viewpoint
  • June 6, 2016

The Basics

So, what are you doing? A conversion or a migration?

Well, the answer will be surprising to some. In almost all cases, it’s both. We’ve been using these two terms interchangeably for some time now, but they mean completely different things. Data conversion is the transformation of data from one format to another, basically your Extract, Transform, and Load of the data.

Data Migration is moving data from a source system to a destination system, which will almost always include data conversion. Data migration also includes activities like data cleansing, data validation, and ensuring data quality into the destination system.

The software available for conversions is almost endless. On top of it, the software selection processes of evaluating a product’s fitness for the objectives can be a time-consuming process. The range is seemingly infinite, from homegrown tools, to shrink wrap solutions, to every kind of customization in between. While the tools may be endless, a common approach can be applied for most conversions.

This article covers the basic approach for a project which involves converting data from legacy systems into a Quote-to-Cash cloud platform, Salesforce. To avoid getting stuck defining a deep understanding of the data, the process and steps will be detailed using a data set common to most of us, and abundant in many systems, contacts. To further simplify an often daunting task, this conversion will focus on using Kettle to perform the data conversion.

The Process

Our objectives for any data migration and conversion are:

  1. Ensure that all the data required by the application will be available and accurate
  2. Analyze the existing data sources
  3. Design a technical approach for extracting, manipulating, and populating data into the target database
  4. Identify and set up conversion tools for improving data conversion efficiency and managing risks

There are many ETL tools available that support our process. Highly recognized in the industry are Pentaho’s “Kettle” ETL tool and Apex Data Loader for SFDC. Pentaho’s “Kettle” is known for ease of use, best fit for the source/destination systems, a breadth of on-line support, flexibility, and a having a large support community. Being an Open Source tool, the price was right, too. The selection process was minimal as we’ve previously used it on other projects.

To use the KISS method, we broke down the migration process into separate functions. During the Data Conversion Approach phase, we determined the best method was to base the functions by these data types.

Accounts

Addresses

Contacts

Contracts

Line Items

By compartmentalizing the data, we’re able to quickly make changes, easily re-use components, and have minimal maintenance. It allows for a methodological approach to execution of the migration, each process can be evaluated for success before moving on to the next process.

For all functions above, there is an input, a process, and an output. When this function is completed, all the recognized contacts for the source system will be in SFDC on the correct account.

Inputs can either be in the format of a mapping document or an SQL extract statement. A mapping document in this case maps source account IDs to destination account IDs. An SQL extract pulls data from the source system to be used in the translation. The process is either a Kettle script or other form of software/script execution (in our case, Apex Data Loader for SFDC). The output can be a file (which becomes the input for another process) or an SQL insert statement.

Kettle Script

The purpose of the script is to extract contacts from the source system, determine if the contact exists or not in the destination system, and provide the appropriate output file for updating the existing contact or creating contacts for those that don’t exist in SFDC. In addition, we add the SFDC record type. The account id and the e-mail address are used to determine unique records.

The ETL script has the four inputs from above (two extracts from the source and destination systems, and two mapping files). The file output is the updated or created contacts in SFDC that are the inputs to the Apex Data Loader process.

To further deep dive into this this kettle script, here’s what it does.

This section of the Kettle script performs these functions:

Step Purpose
Extract Contact Data from Source An SQL extract of contacts from the source system
Source_ID_CSV File Input Mapping file of source account IDs to destination account IDs
Source2SFDC_ID Adds destination account ID to data string
Email to Lower Case Forces email addresses to lower case for later matching

This section of the Kettle script performs these functions:

And, finally:

Step Purpose
Salesforce Input Export of all contacts from the destination system
Email to Lower Case Forces email addresses to lower case for later matching
Stream Lookup Matches source and destination contacts
Contact Record Type File Provides mapping of the to-be used record types
Lookup RecordTypeID Associates record type to the contact record
Add to Process Adds a specific data field required for the destination system
Sort Rows Just sorts the rows for the next step
Unique Rows Insures that there is a unique contact per email per account

This section of the Kettle script performs these functions:

Step Purpose
Filter Rows Splits the data (contacts) based on whether record is existing in destination system
Update Contact for SFDC File ready for the Apex Data Loader to update specific fields in the destination system
Create Contact for SFDC File ready for the Apex Data Loader to create new contacts in the destination system

Audit Files

Notice all the audit files? We asked the auditors what they needed to satisfy their requirements for migrated data. They let us know they wanted to know what was extracted, what was modified (transposed), and what was loaded. We had clearly covered this for them. We had already included auto-generate audit files for the four extracts (if an SQL statement), after each transformation, and if the output is not a file, another audit file was generated. Not only did this satisfy the auditors, it made trouble shooting data anomalies very easy.

Summary

It’s important to keep things simple for many reasons. In this case, we’re able to make changes to the scripts and mapping changes without impacting any other part of the process. This also allows us to test and validate a single script without having to run a full migration. In addition, as we continue to improve the scripts and bring on other source data from different systems, each process is re-usable with minor refactoring.

Our client was highly impressed with the quality of the data, how quickly the entire process ran, and the benefits of re-usability for future migrations.

Larry Davis is a Director in ATG’s SQS where he is responsible for testing practice leadership & delivery excellence. He has over 30 years of experience in the telecommunications industry including over 15 years working professional consulting services.