CHAPTER 20

image

Estimating ETL Projects

Out of the various duties assigned to the ETL project manager or architect, one of the most critical is the establishment of a project estimate. When considering a new project of any type, executives, managers, and other decision makers will ask first, “How long will it take?” Appropriately setting time frame expectations is critical to the success of the project.

In this chapter, I’ll discuss the challenges and the various components of effective project estimation. Although I won’t define a silver-bullet formula here, you should walk away with some things to keep in mind when constructing estimates for level of effort.

What is being measured?

When I talk about estimating ETL projects, I mean setting an expectation of the level of effort required to complete the initiative. Depending on the type of organization and the type of projects, one or more of the following metrics may be used to measure the expected effort:

  • Number of hours to complete the project. In ETL projects where the work is hired out to a separate vendor, the sum of hours required to complete the work is most often the chief metric used for estimating the project effort.
  • Time to market. ETL projects that are handled internally are most often estimated by the calendar time until the solution is implemented. Time to market may also be a key metric used to monitor outsourced projects where rapid development could lead to a competitive advantage.
  • Level of staff engagement. Conscientious business leaders also take into account the impact of ETL projects on their internal staff, regardless of whether the development is handled internally or through a service provider. Even for outsourced projects, executives and other stakeholders will often demand to know how much time their nontechnical resources (those providing business input, testing, and validating results) must spend supporting the development initiative.

Why estimate?

Estimating the level of effort for ETL projects is as much a part of the process as deploying the SSIS package. Even in the smallest of organizations, responsible project owners will demand to know the impact of any such proposed solution. They will ask the following questions:

  • How much will it cost?
  • How will my staff be impacted?
  • How long will it take?

Challenges

Any good project manager will tell you that the hardest part of almost any initiative (not just ETL projects) is properly estimating the level of effort and amount of time involved in getting to the finish line. Why is it so difficult?

It’s difficult because it requires—communication

A large part of the reason that project estimation is hard is that a project requires excellent communication. Now, don’t get me wrong—I don’t want to paint the picture of the stereotypical computer geek sitting in a closet slinging code for 18 hours a day while his superiors toss in pizza and Mountain Dew to keep him fueled. Today’s computer nerd is smart, eloquent, and good with people. (Okay, not all of them, but you get where I’m going.)

Even for the most skilled people person, finding the appropriate amount of communication for a project is difficult. Spend too much time talking and you don’t get the work done; spend too little, and the developers work from specs that are pure fiction. Among the chief challenges and downfalls with regard to communication are the following:

  • Not asking enough of the right questions. Properly engaging stakeholders (project champions, executives, and end users) to assess their expectations and business needs is critical. One of the most significant challenges—and most frequent mistakes—is a breakdown between those architecting ETL projects and those who will be impacted by them.
  • Incorrect assumptions. The last point notwithstanding, it’s almost impossible to interrogate stakeholders about every possible decision that will need to be made during architecture and development. Assumptions are a natural part of the project life cycle and are critical to the efficiency of any such initiative. At the risk of stating the obvious, the difficulty here is making the correct assumptions. There’s a significant element of guesswork involved here, and the key to overcoming this challenge is making intelligent, fact-based assumptions.
  • Changing requirements. Solution developers often point to this as the chief cause of blown timelines, and too often it is assumed to be caused by either incompetence or malice. Though I have seen a few occasions where project sponsors try to add to the required deliverables as a way to get a little extra work out of the developers, my experience tells me that changing requirements are generally the result of an evolution of understanding as the project moves along.
  • Language differences. Here I’m referring not to technical programming languages, but to the way we communicate. Any project will likely engage a blend of technical and nontechnical personnel with various facets and depths of experience. As such, the languages we speak can be vastly different. Techies tend to speak geek, financial professionals have their own lingo, marketing folks use acronyms and industry terms, and so forth.
  • Engaging an inappropriate number of people. There is a correct number of people for each ETL project. What is that number? As with everything else in the database world, it depends. Too few people and you risk not getting enough user perspective to address the possible points of failure.

Before becoming a consultant, I spent several years in the healthcare industry, and learned first-hand that even the most fundamental understandings can be fouled up by differences in communication. Think about the concept of a day: the most commonly accepted definition of a day is the period defined by the calendar and clock, from midnight to midnight. However, there are segments of business that use different definitions of a day. In my time in healthcare, I found that a day often is defined differently by business units within the same organization, especially when the day applies to a patient visit. Some divisions considered a day to be the common midnight-to-midnight calendar period. Others considered a day to be any 24-hour period, regardless of when it started or ended (for example, a patient visit lasting from 7 p.m. on a Friday to 6 p.m. on the following Saturday would be considered one day). Still others did not recognize the concept of multiple-day periods, and would consider any multi-day patient visit to be a series of single-day visits.

Needless to say, reconciling these differences in communication can be difficult, and just as importantly, can cost valuable time. As ETL professionals, a big part of our job is to integrate data from multiple sources and provide for accurate and consistent reporting across multiple domains of information. Certainly the alignment of these various data sets is a problem that can be solved, but getting out in front of this problem early is critical for an ETL project. If different groups of stakeholders expect differing definitions of something as fundamental as a day, it’s essential to design the ETL solution with these expectations in mind. Otherwise, any project timeline estimate is likely to be completely off, since a retrofit would likely require far more effort than designing it properly the first time.

It’s difficult because it requires guesswork

Those who develop ETL processes are, to some extent, scientists—we deal with rules, formulas, and algorithms that can usually be leveraged to predict output based on various input factors. Applying a specific mix of inputs in the right order ought to, within a small margin of error, result in predictable and reproducible output.

However, the same cannot always be said for predicting the life cycle of a development initiative. Although there are ways to predict some elements of ETL projects, the sheer number of unknowns—specifically, those things that cannot be fully known until the project is well underway—makes the process of building an accurate time sequence very difficult.

When estimating the effort required for a successful ETL initiative, we base our figures on all of the information we have at the time to create the best approximation possible. But even in the best-case scenario, it’s still a guess (albeit an educated one).

It’s difficult because it relies on technology

It goes without saying that any successful ETL endeavor requires reliable hardware and software. Getting the right tools, sharpening the skillset required to use those tools, and keeping everything up and running is essential. This is especially true when blending architectures during system consolidation or as part of a merger or acquisition. Systems that speak different languages require the right blend of hardware and software tools to work well. The same could be said for very large ETL initiatives, where the sheer volume of data can bring even world-class systems to their knees. Further complicating matters, technical challenges are sometimes discovered late in the project, forcing a retrofit and costing valuable time.

Note that in the list of difficulties in ETL project estimation, I placed technical challenges at the end. I did so with a purpose. I don’t want to diminish the role that technology plays in a successful and efficient ETL initiative: without the proper systems, professionals’ ability to effectively do their jobs is inhibited, and the project timeline will suffer. However, in most cases, the technical components of an ETL project present far less risk than communication issues. Technical problems can usually be solved by writing a check (assuming the organization has deep enough pockets), but the same cannot be said for deficiencies in communication.

The secret to estimating ETL project timelines is….

... that there is no secret. As I mentioned earlier, there is no secret sauce, no multiplier, no algorithm that can determine with certainty the amount of time and effort required to bring an ETL project to completion.

Though there is no magic formula for creating an accurate estimate, there are some best practices that can help to make the process of creating realistic timelines a little easier.

Don’t forget the little things

An ETL initiative is a development project. It’s easy to get caught up in the “development” part and lose sight of the “project” component. Think about it—the typical picture of a development effort is a team of developers staring a bank of monitors, slinging code by the kilobyte. Although this is a realistic expectation, this is certainly not the only element of an ETL project.

Every project is different. However, there are groups of activities that are common to most any development iniative. Some of the elements you will need to consider include:

  • Requirements gathering. Getting the green light to start on a project does not imply a license to start development. In most projects, the project manager, business analyst, and/or developer will need to research and document the required behaviors of the final product. This phase typically requires a number of user interviews, so don’t be surprised by the amount of time required here.
  • Documentation. I’ll confess: I don’t like creating documentation. (News flash: nobody does.) However, accurate documentation is essential for the long-term supportability of any ETL initiative, and in many cases will be contractually required as a deliverable. Remember to budget sufficient time not just to create the documentation, but to update it as the technical elements of the solution evolve during the development life cycle.
  • Testing. This is one of the most frequently underestimated components of an ETL project. Testing an ETL solution often presents difficulties that don’t necessarily exist in other development projects. For example, when developing a Windows application, it is usually possible to outsource wholesale testing to users who don’t necessarily have deep knowledge of the information domain. On the other hand, testing and validating the results of an ETL process generally requires personnel who are deeply knowledgeable about the underlying data. The testing and validation cycles can be time-consuming. Don’t underestimate here—be sure to allow enough wiggle room to adequately test and validate the results.
  • Environmental promotion. ETL development may require you to move the solution between environments—for example, moving from development to testing to staging and finally to production. In some organizations, especially large companies, there are specific requirements that must be met before promoting a solution (particularly when targeting the final production environment). Furthermore, moving or changing code ad-hoc is usually disallowed; instead, specific deployment windows provide structure and documentation for any code changes.
  • Multiple iterations. Depending on the chosen development methodology, the project may iterate over various cycles. Be sure to include the iterations in your timeline. Even if an iterative methodology is not used, these types of projects will still have repetitive components. For example, if the code has to go back through development for a correction or feature addition, remember that the solution will also have to go back through testing as well.

Plan for the unexpected

Let me lead by saying that I don’t advocate the arbitrary padding of project estimates. Any knucklehead can create a rough timeline and multiply it by 100 just to be safe. You can get away with ridiculously padded estimates—but only for a little while.

Here’s how it plays out: you create a bloated project timeline that gives enough time to complete the ETL initiative even if you were transforming the data with a chisel and stone tablets. The project hits a few bumps along the way but completes in a reasonable time, well ahead of your estimate. Next time you create the estimate, the project sponsor has less faith in your estimate, and encourages you to cut it down. From there, every time you come in ridiculously ahead of schedule you cut into your credibility, and project sponsors will no longer take seriously any of your estimates.

That being said, any project estimate should have some amount of wiggle room for the inevitable unexpected snafu. I’ve never worked on a single ETL initiative that didn’t have some hiccup outside the scope and timeline of the initial estimate. It happens—regularly. The important thing here is to know the elements of the project that are most at risk for slowdowns, and take those risks into account when constructing the proposed timeline. Timelines can be impacted by various causes, but a few of the particularly risky elements include:

  • Key people who are unavailable, overworked, or disengaged. ETL solutions are not developed in isolation. Even if the construction of the technical elements occurs quickly, if stakeholders critical to the success of the project can’t get into it, the timeline is likely to swell. As much as possible, stay linked up with people. Know where they are in terms of their involvement, and make it easy for them to stay engaged.
  • Weak project champion. The project champion is the one driving the bus, and typically has the most to gain or lose on the project. This is the person who keeps executives excited about the project, and generally serves as the cheerleader for the initiative (among other duties). If this person is lackadaisical or less than enthusiastic about the process, consider it a significant risk.
  • Many moving parts. It goes without saying that integrating systems with three sources is far easier than a solution with 30 sources. Keep in mind that, in many cases, adding sources or destinations causes an exponential rather than a linear increase in the level of effort.
  • Previous failures. Has this initiative been unsuccessfully attempted before? If it failed, are the key causes of failure still present? Don’t underestimate a history of failure, especially if the conditions have not changed since the last attempt.
  • Technical time bombs. Sometimes, technical problems lie in wait, eager to rear their ugly heads at the most inopportune moment. How’s the disk space on the affected systems? Is the network burdened by slow links or excessive traffic? Is there a piece of equipment that represents a single point of failure in the ETL pipeline?

Sometimes these risks turn out to be benign, but don’t be caught off guard. The more you understand how timelines can be affected by anomalies caused by these and other situations, the better you’ll get at creating accurate project estimates.

Know the personalities involved

“If you think working with data is difficult, try working with people.” Although this stereotypical geek speak is neither constructive nor politically correct, the fact remains that the personalities involved in a project present a significant element of unpredictability that can have a great deal of impact on the timeline of a project. It’s not always possible to know which specific people will be taking part in a project when putting together your estimate. However, if that information is available, it’s prudent to consider the abilities, temperament, and prior history of the people you know will be engaged on the project. Will you be working with a business liaison known for being unnecessarily difficult? Is the project manager a rock star with a history of on-time project delivery? If good fortune gives you access to this information before you create your estimate, it’s perfectly reasonable to bias your estimate based on past performance of key players.

Learn to do it right by doing it wrong

There is no substitute for experience. Spending time getting to know the ETL process will help to create a better understanding of what is required for a successful ETL initiative, and will help to improve the accuracy of level-of-effort estimates. Whether the estimate is performed by the ETL developer, architect, project manager, or a combination of all three, experience is the most useful tool available.

Don’t be overly afraid of being wrong. I mentioned earlier that estimates are essentially guesses. Sometimes when you guess, you’re right. Sometimes you’re wrong. You’ll make mistakes in estimation, and you’ll learn from them. The more mistakes you make, the more you’ll learn how those little nuances, previously hidden or otherwise insignificant, can affect your project timeline.

When the timeline slips, communicate early and often

Bad news is never good, but it’s easier to handle with a little warning. In the inevitable case where a risk event turns into a slipping point for the project, get out in front of it! Communicate with project staff members, the project champion, and other stakeholders as necessary. With some advance warning, it may be possible to realign resources or change the sequence of events to minimize or even neutralize a hiccup in the schedule.

Summary

Creating accurate project estimates is both difficult and necessary. It is a fragile process which often relies on sketchy information and many unknowns. Estimating project timelines will always require a significant bit of guesswork, but it doesn’t have to be a complete shot in the dark. Accept that there are some things you cannot predict, but use the information you do have to craft a reasonable project timeline. Rarely will an estimate be 100 % correct, but with experience, attentiveness, and good communication, you can build your own estimating success story.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.143.239.44