Out of the various duties assigned to the ETL project manager or architect, one of the most critical is the establishment of a project estimate. When considering a new project of any type, executives, managers, and other decision makers will ask first, “How long will it take?” Appropriately setting time frame expectations is critical to the success of the project.
In this chapter, I’ll discuss the challenges and the various components of effective project estimation. Although I won’t define a silver-bullet formula here, you should walk away with some things to keep in mind when constructing estimates for level of effort.
When I talk about estimating ETL projects, I mean setting an expectation of the level of effort required to complete the initiative. Depending on the type of organization and the type of projects, one or more of the following metrics may be used to measure the expected effort:
Estimating the level of effort for ETL projects is as much a part of the process as deploying the SSIS package. Even in the smallest of organizations, responsible project owners will demand to know the impact of any such proposed solution. They will ask the following questions:
Challenges
Any good project manager will tell you that the hardest part of almost any initiative (not just ETL projects) is properly estimating the level of effort and amount of time involved in getting to the finish line. Why is it so difficult?
It’s difficult because it requires—communication
A large part of the reason that project estimation is hard is that a project requires excellent communication. Now, don’t get me wrong—I don’t want to paint the picture of the stereotypical computer geek sitting in a closet slinging code for 18 hours a day while his superiors toss in pizza and Mountain Dew to keep him fueled. Today’s computer nerd is smart, eloquent, and good with people. (Okay, not all of them, but you get where I’m going.)
Even for the most skilled people person, finding the appropriate amount of communication for a project is difficult. Spend too much time talking and you don’t get the work done; spend too little, and the developers work from specs that are pure fiction. Among the chief challenges and downfalls with regard to communication are the following:
Before becoming a consultant, I spent several years in the healthcare industry, and learned first-hand that even the most fundamental understandings can be fouled up by differences in communication. Think about the concept of a day: the most commonly accepted definition of a day is the period defined by the calendar and clock, from midnight to midnight. However, there are segments of business that use different definitions of a day. In my time in healthcare, I found that a day often is defined differently by business units within the same organization, especially when the day applies to a patient visit. Some divisions considered a day to be the common midnight-to-midnight calendar period. Others considered a day to be any 24-hour period, regardless of when it started or ended (for example, a patient visit lasting from 7 p.m. on a Friday to 6 p.m. on the following Saturday would be considered one day). Still others did not recognize the concept of multiple-day periods, and would consider any multi-day patient visit to be a series of single-day visits.
Needless to say, reconciling these differences in communication can be difficult, and just as importantly, can cost valuable time. As ETL professionals, a big part of our job is to integrate data from multiple sources and provide for accurate and consistent reporting across multiple domains of information. Certainly the alignment of these various data sets is a problem that can be solved, but getting out in front of this problem early is critical for an ETL project. If different groups of stakeholders expect differing definitions of something as fundamental as a day, it’s essential to design the ETL solution with these expectations in mind. Otherwise, any project timeline estimate is likely to be completely off, since a retrofit would likely require far more effort than designing it properly the first time.
It’s difficult because it requires guesswork
Those who develop ETL processes are, to some extent, scientists—we deal with rules, formulas, and algorithms that can usually be leveraged to predict output based on various input factors. Applying a specific mix of inputs in the right order ought to, within a small margin of error, result in predictable and reproducible output.
However, the same cannot always be said for predicting the life cycle of a development initiative. Although there are ways to predict some elements of ETL projects, the sheer number of unknowns—specifically, those things that cannot be fully known until the project is well underway—makes the process of building an accurate time sequence very difficult.
When estimating the effort required for a successful ETL initiative, we base our figures on all of the information we have at the time to create the best approximation possible. But even in the best-case scenario, it’s still a guess (albeit an educated one).
It’s difficult because it relies on technology
It goes without saying that any successful ETL endeavor requires reliable hardware and software. Getting the right tools, sharpening the skillset required to use those tools, and keeping everything up and running is essential. This is especially true when blending architectures during system consolidation or as part of a merger or acquisition. Systems that speak different languages require the right blend of hardware and software tools to work well. The same could be said for very large ETL initiatives, where the sheer volume of data can bring even world-class systems to their knees. Further complicating matters, technical challenges are sometimes discovered late in the project, forcing a retrofit and costing valuable time.
Note that in the list of difficulties in ETL project estimation, I placed technical challenges at the end. I did so with a purpose. I don’t want to diminish the role that technology plays in a successful and efficient ETL initiative: without the proper systems, professionals’ ability to effectively do their jobs is inhibited, and the project timeline will suffer. However, in most cases, the technical components of an ETL project present far less risk than communication issues. Technical problems can usually be solved by writing a check (assuming the organization has deep enough pockets), but the same cannot be said for deficiencies in communication.
The secret to estimating ETL project timelines is….
... that there is no secret. As I mentioned earlier, there is no secret sauce, no multiplier, no algorithm that can determine with certainty the amount of time and effort required to bring an ETL project to completion.
Though there is no magic formula for creating an accurate estimate, there are some best practices that can help to make the process of creating realistic timelines a little easier.
Don’t forget the little things
An ETL initiative is a development project. It’s easy to get caught up in the “development” part and lose sight of the “project” component. Think about it—the typical picture of a development effort is a team of developers staring a bank of monitors, slinging code by the kilobyte. Although this is a realistic expectation, this is certainly not the only element of an ETL project.
Every project is different. However, there are groups of activities that are common to most any development iniative. Some of the elements you will need to consider include:
Let me lead by saying that I don’t advocate the arbitrary padding of project estimates. Any knucklehead can create a rough timeline and multiply it by 100 just to be safe. You can get away with ridiculously padded estimates—but only for a little while.
Here’s how it plays out: you create a bloated project timeline that gives enough time to complete the ETL initiative even if you were transforming the data with a chisel and stone tablets. The project hits a few bumps along the way but completes in a reasonable time, well ahead of your estimate. Next time you create the estimate, the project sponsor has less faith in your estimate, and encourages you to cut it down. From there, every time you come in ridiculously ahead of schedule you cut into your credibility, and project sponsors will no longer take seriously any of your estimates.
That being said, any project estimate should have some amount of wiggle room for the inevitable unexpected snafu. I’ve never worked on a single ETL initiative that didn’t have some hiccup outside the scope and timeline of the initial estimate. It happens—regularly. The important thing here is to know the elements of the project that are most at risk for slowdowns, and take those risks into account when constructing the proposed timeline. Timelines can be impacted by various causes, but a few of the particularly risky elements include:
Sometimes these risks turn out to be benign, but don’t be caught off guard. The more you understand how timelines can be affected by anomalies caused by these and other situations, the better you’ll get at creating accurate project estimates.
Know the personalities involved
“If you think working with data is difficult, try working with people.” Although this stereotypical geek speak is neither constructive nor politically correct, the fact remains that the personalities involved in a project present a significant element of unpredictability that can have a great deal of impact on the timeline of a project. It’s not always possible to know which specific people will be taking part in a project when putting together your estimate. However, if that information is available, it’s prudent to consider the abilities, temperament, and prior history of the people you know will be engaged on the project. Will you be working with a business liaison known for being unnecessarily difficult? Is the project manager a rock star with a history of on-time project delivery? If good fortune gives you access to this information before you create your estimate, it’s perfectly reasonable to bias your estimate based on past performance of key players.
Learn to do it right by doing it wrong
There is no substitute for experience. Spending time getting to know the ETL process will help to create a better understanding of what is required for a successful ETL initiative, and will help to improve the accuracy of level-of-effort estimates. Whether the estimate is performed by the ETL developer, architect, project manager, or a combination of all three, experience is the most useful tool available.
Don’t be overly afraid of being wrong. I mentioned earlier that estimates are essentially guesses. Sometimes when you guess, you’re right. Sometimes you’re wrong. You’ll make mistakes in estimation, and you’ll learn from them. The more mistakes you make, the more you’ll learn how those little nuances, previously hidden or otherwise insignificant, can affect your project timeline.
When the timeline slips, communicate early and often
Bad news is never good, but it’s easier to handle with a little warning. In the inevitable case where a risk event turns into a slipping point for the project, get out in front of it! Communicate with project staff members, the project champion, and other stakeholders as necessary. With some advance warning, it may be possible to realign resources or change the sequence of events to minimize or even neutralize a hiccup in the schedule.
Summary
Creating accurate project estimates is both difficult and necessary. It is a fragile process which often relies on sketchy information and many unknowns. Estimating project timelines will always require a significant bit of guesswork, but it doesn’t have to be a complete shot in the dark. Accept that there are some things you cannot predict, but use the information you do have to craft a reasonable project timeline. Rarely will an estimate be 100 % correct, but with experience, attentiveness, and good communication, you can build your own estimating success story.
3.143.239.44