How Good is the Source Data?

The main problem Mary points out to Rory is that the separation of address types is not as clear cut as he thought. Some of Mary's contacts are home workers, and for them she has put their "home" address in the "business" address fields. Other contacts are self-employed and for them their "home" is also their business address. In fact, as they review the data, it becomes clear to Rory that Mary's allocation of addresses has been fairly haphazard.

A previous solution's inability to control or correctly organize the data presented to it, is often a key reason for replacing it with a new solution. For instance, one of the problems with using a spreadsheet to store information is that they usually allow free text entry of data. Users are often inconsistent with their entries, especially if more than one person enters the data and free text entries put no control in place to prevent this behavior.

As an example, a company whose business systems created a works order for each product they manufactured. Each works order contained a unique number that was used to track the product through manufacture, sales, and after sales support. The number comprised the first letters of the words "Works Order " and then an incremental number. So the number of the first product produced by this system had the form: "WO0001". Once in use, a problem became apparent in that users commonly failed to appreciate that the second character was the letter "O" and would often enter these product numbers as: "W00001". Less commonly, they would enter all the round characters as: "O", including the zeros: "WOOOO1". The result was that in reports a single product could appear as three separate products: "WO0001", "W00001", and "WOOOO1".

The problem was fairly easy to deal with in the main applications. The problem was more difficult to deal with in the small spreadsheets managers and supervisors created for their teams to track progress and problems. These were often put together simply and had no validation of input. The result was misreporting, and wasted time tracking problems that had already been dealt with but logged with a different form of the works order number.

Therefore, it is extremely likely that existing data will require processing before it is input to an application. To do this successfully, the user and developer must work together to create a set of rules to deal with any inconsistencies. Some processing can be automated. In the works order example, it would be easy to process works order numbers so that all zeros at the second character position were converted to "O" and all O's in the following positions were converted to zeros. A simple batch script could be created to carry out this processing. It could even be incorporated into the data import process for the new application. However, some data inconsistencies require manual correction and there can be little alternative than to have someone go through the data and correct any inconsistencies.

Rory realizes that the address problem is not systematic. That is, there is nothing within the data itself that would allow an automation script to detect that the data needs to be altered. Therefore, he is unable to create a script to correct the problem. However, when he suggests to Mary that she go through all her contacts and correct the inconsistencies, she states that she cannot give in the time required to do that. Therefore, Rory suggests a compromise—he prints out a listing of the contacts and addresses, and then asks Mary to skim through these and mark any address she spots as being incorrectly assigned. He suggests that if it is a home address that should be a business address, she mark it with a "B", and if it is a business address that should be a home address, she mark it with an "H". Self-employed workers should be marked with "HB", which means that the same address is used for both home and business. As this is a lot easier to do than manually move each one within

Mary's email application, and she readily agrees to do this. When she is finished, Rory is able to process the raw data and move the addresses as marked.

However, even after going through this process, Rory realizes that some erroneous address allocation has not been corrected. It becomes apparent that the new application will need to have the facility to easily move addresses from the person to the company so that inconsistencies within the data can be corrected as the data is used.

Note

Assume errors will get through from the source data and provide tools within your application to easily correct those errors.

You can waste a lot of time and effort tracking down and correcting every error in source data. To compound the problem, errors can often be hard to identify within raw data, yet be only too obvious once the application goes into production. A pragmatic solution to that problem is to accept that some errors will get through, despite your best efforts. Make a best effort to correct errors before and during the import process. Then make sure you build in processes and methods that allow users to correct errors easily as they find them. The key to success of this strategy is to make it easy for the users of the application to identify and correct errors as they find them.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.173.199