Hi,I have 10 sites. They all use different databases like Lifeline, Raiser's Edge, Excel, Access, Exceed Basic, Pledgemaker, etc. They all have different naming conventions for fields and different table names. I want to migrate data from all of this into a centralized database. First step is to decide what kind of database should be used as a centralized database. After that come up with a schema for it and migrate data from all the above mentioned different databases into it. Can anyone help me?
Well there are a number of steps to take.
Firstly know your source - from the Bloor report (Data Migration 2011) to my own writing we all stress profiling or in depth analysis of the sources. Most every data source will have some errors in it as well as the problems of relating one source to another. Find out how the database is supposed to work and create queries for every rule to find all the exceptions (e.g. foreign key violations, range value violations). Don't forget to perform what I call the reality check - even if it looks right and follows the rules, does it match up to business reality? There are plenty of systems out there (probably the majority) where there are important entities missing.
Second make sure that you record all the errors you find in one place. Accept that there will be more of them than you have time to fix so you need to involve both the technical and business sides in a process to prioritise what you fix and what you don't.
Third once you get a grip on the legacy in its own terms its time to start comparing across legacy systems. Here you need a common model. I favour the use of formal data modelling techniques to do this but if you are mid stream on this project, this is probably not the time to start learning them. An alternative is to find the most likely or most comprehensive database you have (the target if it is available) or one of the legacy or a composite crafted from a blend of the legacy available and start matching each legacy to this Migration Model. The benefit of using formal methods here is that they are cheaper and quicker to spot structural differences. I have no idea what industry you are in but from experience it doesn't matter if you are in the loans or the automobile business if you have 10 sources you are likely to have 10 structurally different representations of you products, customers etc.
And when it comes to customers you will need to de-duplicate both within the legacy sources and across so you don’t create more than one representation of the each customer in the target.
Avoid trying to match each legacy against each of the other legacy. If you have 10 legacy then that is 45 possible comparisons against only 10 or even only 9 if you pick one of the legacy as you migration model (I think that maths is correct I’m in a bit of a rush answering this).
The formal model really helps here because it allows you to express the similarities that are obscured by different naming conventions as well as the differences. It is structural problems that are the hardest to resolve (having 5 types of customer in one place but 7 in another) rather than code value ones (having M and F for sex in one place but Men and Women in another). You need to see beyond the representation in names and physical format to the underlying structure. You will get there by trying to physically map field to field but it is easier to get lost.
Of course wherever there are differences you have to go back to your client population to agree a resolution. No point in creating a solution that is rejected at user acceptance.Finally test every assumption. Believe nothing. Every rule that you are given should be tested. Better to perform a hundred wasted queries in the quiet of the development shop than find a real show stopper in the full glare of go live.
Best of luck. I'm not sure where you are based but I will be advertising the next in the Data Migration Matters conferences in the linkedIn group that relates to this site - Data Migration Professionals - shortly. It will be held at the British Computer Society HQ just off Covent Garden in London on 24th May. It will have demonstrations of profiling and data quality as well as ETL software (which is what we have been talking about) some of which is free to take away. Maybe I'll see you there and you can tell me if this was any help?