Data Migration Effort Estimation: Practical Techniques from Richard Trapp of Northfield Consulting Group
How do you cope with the uncertainty around data quality on data migration projects, particularly with budgeting and forecasting effort?
Please note, this post first featured on our sister site Data Quality Pro in : How do you estimate impact of poor data quality on a data migration project?
In an excellent discussion on the IAIDQ LinkedIn forum Richard Trapp attempted to answer this very question posed by John Platten, long time expert panelist on Data Quality Pro and Data Migration Pro.
John Platten of Vivamex opened out the discussion for opinions from others on...
...what to do if you take on a migration lead role and it becomes clear after the contract is signed that the data is too poor, the budget too small...and you realise that the outcome is going to be compromised...
John, I overcome the hurdle you mention by employing a very comprehensive parametric estimating model. All scope assumptions are clearly presented and approved prior to work starting. When/if assumptions turn out to not be true (number of objects, complexity, environmental constraints, etc.), the impacts to progress are documented and communicated (Earned Value Analysis) and decisions presented to the client to either reduce scope to meet current budget, or to add budget to meet the change in scope.
Intrigued, I asked Richard to expand and he kindly answered several questions that clarified his earlier comments with a detailed explanation.
Dylan Jones: Richard, great discussion on the data quality forecasting topic. Can you elaborate on what you mean by: "comprehensive parametric estimating models”. I’m sure our readers would love to know how are they created, why does the client benefit etc.?
Richard Trapp: Sure Dylan.
A fundamental component of any project’s success is the ability to estimate the level of effort and cost of the work that will be performed.
To undertake a project without a clear articulation of the scope, effort and costs required to deliver the requested work can lead to issues such as:
- Incorrect decisions to proceed or not proceed#
- Inefficient use of resources, delivery overruns (cost)
- Lost opportunity (unrealized benefits)
- Missed customer expectations
- Customer dissatisfaction
Estimation Techniques for Data Migration Effort
"Top Down" (Analogous)
Typically used to determine "Rough Order of Magnitude” estimates early in the project lifecycle. Relies on actual durations, effort or costs from previous "similar” projects as a basis for estimating current project effort or costs.
"Bottom Up" (Decomposition)
Used to determine definitive estimates. Activity scope is "decomposed” to lowest meaningful task level (Work Breakdown Structure) to which work effort and costs are assigned. The levels of effort for tasking is either arrived at via consensus (see Expert Opinion below) or are established standards (see Parametric below) that have been collected and validated over previous projects.
Expert Opinion (Delphi)
Used to obtain definitive estimates. Relies on a consensus approach to arrive at an estimate. Typically, a handful of qualified "experts” (including a moderator) meet to discuss the project to be estimated and then each team member independently produces a task list with estimates of effort for each task. Each task list and estimate are then shared anonymously by the moderator to the entire team and key assumptions are revisited. This process is repeated iteratively until a consensus is reached.
Parametric (Object Based)
Used to obtain definitive estimates. The level of effort for a single activity or task is determined from a standard, which has been established from previous experience. These models allow various parameters to be flexed such as: work driver complexity mixes, risk tolerance levels, team member experience, environmental complexity, etc. to tailor the "standard” to meet specific requirements. The ability to modify parameters also supports the preparation of "What-if?” scenarios.
My firm, Northfield Consulting Group, employs a Parametric/Bottom Up model that has been harvested and refined over dozens of data quality initiatives. The estimating model is comprised of hundreds of tasks and dozens of work effort drivers and parameters.
The end result is a comprehensive view of the work that will be performed and an extremely precise estimate of the associated effort and cost.
Dylan Jones: You mentioned "harvested and refined over dozens of data quality initiatives”. What kind of DQ projects were these, do they vary or is there a standard project you typically undertake it?
Richard Trapp: Northfield Consulting Group offers three consulting solutions; Systems Data Readiness, Assurance & Optimization and Data Quality Competency Center design and deployment.
While we have separate estimating models for each, I will respond in the context of our Systems Data Readiness solution for purposes of this interview.
NCG’s Systems Data Readiness solution provides the capabilities to inventory, profile, assess and remediate source system data defects that are compromising the process design of existing applications/data warehouses, or which may compromise the intended process design of new systems implementations/ data warehouses.
Underpinning our approach is a business value-driven philosophy that ensures focus on areas of maximal impact. This unique perspective – focusing on business value, rather than defect volume – optimizes our clients’ data quality investment.
A typical engagement is to "ready” data in support of a large-scale SAP or Oracle implementation.
Data Quality Pro: I’m assuming that before you can estimate effort you need to have a good understanding of the number of events that will require resolution. How do you arrive at that figure?
Richard Trapp: At the close of each engagement, we compile statistics and run analysis on what was planned/expected vs. what was actually done/realized.
This analysis is done for the work effort drivers/ parameters as well as the effort assumptions that are used to calculate level of effort. We then run regression analysis against our previous estimating baselines and refine as needed. We use these new parameters, in conjunction with work effort drivers (e.g. scope), to estimate the new project.
As new estimates are prepared, these scope inputs and parameters are vetted with the customer and any adjustments are made to accommodate customer needs.
Examples of work effort drivers/ parameters are; number of data objects, number of tables, number of critical data elements, number of tests that will be performed, test type ratios, number of remediation efforts that will be planned, complexity ratios, etc.
Data Quality Pro: A lot of data quality issues are quite insidious and require things like upstream code changes, staff training etc. How does the model factor this level of uncertainty?
Richard Trapp: Based on previous experience, we have complexity ratios or mixes) that are applied to the tasks to be performed (e.g. High, Medium, Low Remediation Procedure Complexity Levels). These complexity mixes drive varying assumptions and levels of effort.