How to rescue a failing data migration project [Part 1 - Stabilise]
Monday, July 6 in
Data Migration Methodology,
Data Migration Techniques Many members contact us with tales of data migration failure and one only needs to look in the computing press to read the high profile data migration disasters that litter our industry.
We recently contacted John Morris, managing director of iergo and author of Practical Data Migration, for some advice on how our readers can help put their data migration project back on track if it starting to slip off the rails.
John kindly wrote the following series of articles, providing a range of detailed tips and techniques to help you rescue your sinking project.
How to rescue a failing data migration project
If we look at the statistics then it's clear that a great many data migration projects either fail or fall far short of their intended goals. Data migration failures are therefore far more common than successes but it need not be this way.
In this article I'm going to show you a simple workflow I typically adopt when asked to rescue data migration projects in crisis.
There are typically 3 main phases I look to implement
- Stabilise
- Plan
- Mop-up
This week we will focus on the process of stabilisation, next week we will explore the Plan and Mop-Up activities.
Stabilisation
This phase rapidly prevents the current situation from getting worse. We aim to address the ineffective working practices and firefighting by creating an environment where a more considered approach can be adopted. We may also need a great deal of tact and diplomacy to defuse internal politics and acrimonious disputes that may have evolved during the project.
Here are some pointers for getting your project stable:
Assess the current status of the project
At Iergo we have developed a comprehensive risk assessment that covers the following main areas:
- Data Architecture
- Business Engagement
- Programme Governance
- Policies
- Migration Delivery
- System Retirement policies
- Key Data Stakeholder Analysis
- Data Quality Rules
We have integrated this risk assessment into a rapid questionnaire which we can apply in hours if we are really pushed, but which we prefer to typically take no more than 2 weeks over.
If this is your programme that is slipping you will be familiar with the issues but it is still worthwhile going through the formal check list to make sure that in the blinkered "group-think" that develops as project fail, you aren’t mistaking the symptoms for the causes.
There are some caveats with this process:
- The iergo list is both comprehensive and measured against industry statistics that allows us to produce an analysis of impact and potential overrun in terms of cost and time. If you are doing the same in-house then you will have to be pragmatic and use your common sense to work out the impacts
- Although this list covers the same areas it is less detailed than the one we use
- The list has to be applied with a sensitivity to circumstance. We weight the results depending on local Policies (see below) and the circumstances of the programme
- I would use it as an aid memoir rather than a set in stone, fixed template
A properly run data migration must cover off all these areas. The missing elements tend to be in the Business Engagement (BE) and Key Data Stakeholder Analysis (KDSA) areas but their impact is usually seen in Migration Delivery. When we are called in to look at failing projects, our attention is nearly always directed to Migration Delivery but the problems typically have their genesis in BE or KDSA.
Within each category look at the following:
Data Architecture
- Landscape Analysis – was it adequately performed? Were the results channelled into the DQR process (see below)? Were they reflected in the Design? The impulse to "just get on and do something" is often overpowering. The results of poor (or none existent) Landscape Analysis is usually found in the vast number of data issues that flood out of the migration engine at run time and swamp the programme
- Metadata understanding – we use a selection of models: Migration Models, Legacy System Models, Target System Models, Conceptual Entity Models – to understand our migrations. Do you have a common understanding across the plethora of systems that make up your sources?
- Master Data Consolidation – what are the key master data items your migration needs to manage? Is it customers or products or personnel? Often in the legacy data, patchy updating is overcome by operating procedures ("Always go to system x when you need the latest phone number not system x because it gets out of date" for instance). However when you try to put it together in a migration, mismatches occur all over the place
Business Engagement – here we cover the more formal aspects of BE:
- Communication Strategy – who is responsible for creating the BE strategy and the methods for getting messages out? How engaged are they in the programme? How aligned are your needs (especially important when a programme is failing) with the mechanisms for communicating? Urgent messages like failed updates or emergency workarounds may not be briefed and are not suitable for cascade briefings for instance
- Data Transitional Rules – in any data migration of consequence there are one set of operating procedures prior to the migration and another set after the migration but there is often a third set specifically to cover special processing during the migration. The most common example is the treatment of transactions that start prior to the migration but end after it – the so-called "in-flights". How are you recording, developing policing and briefing your Data Transitional Rules? Are they being followed? In the current trend towards progressive migrations as opposed to Big Bang, failure to create and follow appropriate Data Transitional Rules will lead to cumulative data errors in the target and source. Often these look like errors in the migration software but aren’t
- Training Plan – It should go without saying but are all your users trained in the new systems at an appropriate time? Train too early and we forget what we’ve learned. Train too late and we’ve already messed up the new system. The so called "Training Lag" is a real inhibitor on large migrations. Are you sure that all your people know what they are to expect?
- Business Re-organisation – Large organisations are in a constant state of flux. Often the migration is as a result of these changes (think of mergers or de-mergers). However the two programmes can get out of step with a delayed migration trying to be managed into a partially transformed business environment. And there can always be business re-organisations that have nothing to do with the data migration but which severely impact it. How much of your issues are related at base to simply not having the right organisation in place?
Governance –One would hope that after a dozen years of Prince etc. we would have a handle on what constitutes proper governance on a data migration programme. I wish that were so. This is not a comprehensive check list but will give you a start:
- Do you have a clear scope statement?
- Do you have a Risks and Issues Log? Is it up to date with a process that works?
- Is your Programme Management Office (PMO) functioning?
- Do you have a change controlled, up to date plan?
- Are your various project and programme boards in place? Do the required linkages up and down the management structure work properly?
- Do you have visibility of the state of the programme and the current pressing problems?
- Do you have the budget under control? Both actual and committed spend?
Policies – these are the high level drivers that shape and inform the scope but they can conflict. Going as fast as possible can conflict with maximising data quality for instance. Has senior management been walked through the policies (some of which are often tacit)? Is there a conflict resolution process? (At a high level, policy conflict resolution is part of the governance activities, day to day conflict resolution should be built into the migration programmes low-level tasks).
Some common policies include:
- Strategic Architectural Alignment – the drive to conform to a strategic architecture may inhibit quick fixes and workarounds that would get us to the solution
- Master Data Management (MDM) – covered above but also consider the common situation where the strategic MDM solution (like the CRM solution for mastering Customer Data) is not the best available source of data. This leads to user resistance to the solution and frequent backtracking to get the "right" data
- Hard Stop Flexibility – crucial to understanding what you can do in terms of lengthening the programme to get the optimal result
- Regulatory Constraints – these can often be your friends in problematic migrations, but don’t overplay the use of the Data Protection Act or SOX compliance as a stalling mechanism as you try to get your migration back on track
Migration Delivery – this is the build, test and execution of migration. In my experience although nearly always running late in delivery, and so therefore inadequately tested, it is rarely the nuts and bolts of the migration that causes failure. It just seems that way as all these other issues that should have been resolved elsewhere cause to it to fail. In any case there are enough books and articles out there on Software Engineering to render a short piece like this redundant but:
- Appropriate Tool Selection: Have you chosen an appropriate tool for your migration? Not that there’s often a lot you can do about a poor selection that is backed by senior management. However it should influence your calculations of time to fix
- Non-Functional Design: So often under-estimated, although by now with a migration failing it tends to be pretty obvious, but can you get the through put, end to end, that your migration needs? What are the bottle necks? Will the smart use of overtime or extended run times help? Can you design out the pinch points? Either technically or by the use of a Data Transitional Rule?
- Fallout Management: Do rejected records fall elegantly into a pre-designed process or do they fly out into a chaotic group of frantic technologists for non-planned and non-audited data hacking to fix? Don’t however mistake a prettily designed fallout reporting tool for a proper fallout management solution. Is there any substance behind the facade? The quickest impact you can make to a failing project is to get a grip here but with the awareness that most of the problems you see here will have their origin in failed activities elsewhere in this check list. Use your actions here not just to address the immediate but to start recovering these other failures. If you don’t you will be in for an awful lot of firefighting
- Fallback Policy: Every well designed migration should have one. I guess if you are reading this in anger then you will be knee deep in yours and it’s a bit late for me to ask how adequate you are finding it but on future projects it really does pay to design a fallback policy early in the project lifecycle
System Retirement Policies (SRP) – these are user facing documents that describe in business terms how the legacy systems are to be decommissioned and what degree of re-assurance the Key Data Stakeholders are going to get that their part of the business will continue to function post migration. In our experience these are rarely in evidence when we parachute into failing programmes. More commonly there are a series of guerilla engagements between the "techies" and the business each trying to browbeat the other with the overwhelming force of senior business sponsorship usually adhering to a totemic reference to ruling Policy. This responsibility gap, that I’ve commented on elsewhere, widens until the project plunges into it. It is symptomatic of a failure to resolve Policy conflicts and a failure to complete Key Data Stakeholder Analysis adequately. System Retirement Policies should include detailed sections on:
- Audit – how does the business user know that all the essential items have been successfully moved?
- History requirements (especially for data items that are NOT part of the new design but essential to business processes)
- Business Migration Restrictions – here is where you record all the business side restrictions – maybe Key Performance Indicators that must be met or busy work periods – that might be compromised by the migration
- Training requirements (touched on above)
- Business continuity – what are the business-side restrictions (that must be fulfilled) that will constrain your fallback policy?
- Reasons to say "No" – the most significant part of an SRP. What are the show-stopper issues that must be resolved before the migration can be signed off? At this point you will aware of some of them. The ones being presented to you right now. But are there others? Will you resolve one lot of restrictions only to have a second lot presented to you? Remember, at this point the user community is running scared. No one likes change and this change is going awfully badly. There will be layer upon layer of objections. You need to get them all out in the open, resolved or mitigated and signed off
Key Data Stakeholder Analysis (KDSA) – It is almost always the case, when we arrive at the site of the disaster that this step has been done badly. KDSA is not a RACI spreadsheet. If that’s all you have – well it’s a start but the passive activity of approving a design is a long way short of the active responsibility of specifying the SRP. It’s really an invitation to accept the power to disapprove without the responsibility to resolve. A proper set of Key Data Stakeholders should include:
- Data Owner – the Data Owner is NOT the person in the IT departments filing system with titular responsibility for the data stores you are migrating from or to. A Data Owner is any person within the organisation with the de facto authority to prevent a migration from occurring. These are the people who must sign off the decommissioning certificate. They each should have completed an SRP
- Business Domain Expert – this is not a technical role. This is a business person with day to day access to the system. They understand what it means in business terms. On very large systems you normally have a smaller set of Business Domain Experts but ones that are trusted by their colleagues and can reach out to the appropriate person in the business to answer any question
- Technical System Expert – normally you are knee deep in these, at least for the target system but don’t confuse them with the Business Domain Experts unless they also have day to day, hands on, experience of the systems you are migrating
- Regulatory and other – there are a lot of potential Key Data Stakeholders but each one should have a clearly defined role
Data Quality Rules – these are both a set of artefacts and the process for managing all data related issues within a migration project. If you are following the method preached in my book Practical Data Migration then you will have set them up. If not you will still (probably) have some mechanism for investigating, prioritising and resolving data issues (even if the prioritisation is the usual mix of using issue management, technical feasibility, peer pressure and generally who shouts the loudest. Actually often when programmes are really under pressure to deliver something, prioritisation becomes an issue of what can be delivered first). Check your process for the following necessary features:
- Method – is there one clearly defined method to handle all data related issues? One that isn’t confused with the development/testing issues of the new stack? Are all your issues in one place or are they scattered around, some in the testing suite, some in fallout, some on the issues log etc?
- Participation – Does the data issues process fully include the Data Owner, Business Domain Experts and other Key Data Stakeholders in both prioritisation and delivery?
- Scaling – Are all of your known data issues fully quantified? Can you report precisely for each issue your percentage complete on delivery?
- Prioritisation - Is prioritisation driven be the active participation of the Business Domain Experts sanctioned by the Data Owners? Not as signoff's but as an informing part of the decision process? Does prioritisation include the option to do nothing and allow existing "bad" data to go over without improvement?
- Delivery – Are all available options considered? That is:
- Fix in the source
- Fix in the migration software
- Allow to fallout and load manually
- Migrate and fix on the target
- Migrate and leave as is
- Don’t migrate?
- Decision-Making - Are the Data Stakeholders party to the delivery decision making? Or is it decided by a cabal of "techies" then presented as a done deal to the business? Where remedial actions are being considered are they controlled and reported on as part of a methodical approach that can be measured?
Next Week:
I will focus on the Planning and Mop-Up Phases, if you have any questions so far please post in the comments section below.
Resources
Iergo Free Data Migration Training Centre
Practical Data Migration by John Morris
Presentation that includes the 4 golden rules of data migration
Data migration project checklist - a template for more effective data migration planning
The data migration go-live strategy - what is it and why does it matter?
Does your data migration strategy have an effective plan for systems retirement?









Reader Comments (1)
If anyone wants to read more about the most commonly found problems and their best solutions in planning and execution, check out our company's blog:
http://qtility.wordpress.com/
and if on facebook, join our Data MIgration Group:
http://www.facebook.com/group.php?gid=117121664060&ref=
Matt Clarke