CHAPTER 2

image

Introduction to Requirements

Whatever pursuit you undertake, the requirements should start with a love of what it is that you are pursuing.

—Bill Toomey

When tasked to design a database system, you have to be mindful that users often aren’t technologists. Sometimes, you may even have to understand that people with organizational power or check signing rights may not have the proper amount of intelligence doled out to them. So when you start to gather requirements, relying on the client’s management professionals to know what they want and how to implement it is almost always complete insanity. On the flip side, keep in mind that you’re building a system to solve a business problem first for the users, then for the people who sign the checks, and not for yourself. What the users (and check writers) want is the most important, so you must consider that first and foremost.

There’s an old saying that you shouldn’t build users a Cadillac when all they want is a Volkswagen (though when this saying was coined, a VW wasn’t quite as expensive as it is today). But overdelivering is a lot better than underdelivering. Consider how excited your client would be if instead of the Volkswagen you had given them a 1983 Yugo (now or then, it doesn’t truly matter!). While the concepts behind these vehicles are similar (four wheels, steering wheel, etc.), erring on the side of quality is usually better, if err you must. Admittedly more often the problem is that the clients wanted a car and what gets delivered is something more akin to a bowl of fruit. Even the nicest bowl of fruit isn’t going to please your users if they paid thirty large for a mode of transportation.

The next problem is that users don’t think about databases; they think about user interfaces (UIs) and reports. Of course, a lot of what the user specifies for a UI or report format is actually going to end up reflected in the database design; it is up to you to be certain that there is enough in the requirements to design storage without too much consideration about how it will be displayed, processed, or used. The data has an essence of its own that must be obeyed at this point in the process, or you will find yourself in a battle with the structures you concoct. In this chapter, we will go through some of the basic sorts of data you want to get and locations to look to make sure you are gathering the right kinds of requirements to begin the database design process.

Of course, if you are a newbie, you are probably thinking that this all sounds like a lot of writing and not a lot of coding. If so, you get a gold star for reading comprehension. No matter how you slice it, planning the project is like this. If you are lucky, you will have analysts who do the requirements gathering so you can design and code software. However, the importance of making sure someone gathers requirements cannot be understated.

During a software project (and really any project, but let’s focus on software projects), there are phases that are commonly gone through:

  • Requirement gathering: Document what a system is to be, and identify the criteria that will make the project a success.
  • Design: Translate the requirements into a plan for implementation.
  • Implementation: Code the software.
  • Testing: Verify that the software does what it is supposed to do.
  • Maintenance: Make changes to address problems not caught in testing.

Each phase of the project after requirement gathering relies on the requirements to make sure that the target is met. Requirements are like a roadmap, giving you the direction and target to get there. Trying to build your database without first outlining those requirements is like taking a trip without your map. The journey may be fun, but you may find you should have taken that left turn at Albuquerque, so instead of sunning your feathers on Pismo beach, you have to fight an abominable snowman. Without decent requirements, a very large percentage of projects fail to meet the users’ needs. A very reasonable discussion that needs to be considered is how many requirements are enough. In the early days of software development, these phases were done one at a time for the entire project, so you gathered all requirements that would ever be needed and then designed the entire software project before any coding started, and so on. This method of arranging a project has been given the somewhat derogatory name of “waterfall method” because the output of one step flowed into another.

The important point I want to make clear in this chapter is simple. The waterfall process in total may have been a complete failure, but the steps involved were not. Each of these steps will be performed whether you like it or not. I have been on projects where we started implementation almost simultaneously with the start of the project. Eventually, we had to go back to gather requirements to find out why the user wasn’t happy with our output. And the times when we jumped directly from gathering requirements to implementation were a huge mess, because every programmer did his or her own thing, and eventually every database, every object, and every interface in the system looked completely different. It is a mess that is probably still being dug out from today.

This book is truly about design and implementation, and after this chapter, I am going to assume requirements are finished, and the design phase has begun. Many books have been written about the software requirements gathering and documenting process, so I am not going to even attempt to come up with an example of requirements. Rather, I’ll just make a quick list of what I look for in requirements. As writer Gelett Burress once said about art, “I don’t know anything about art, but I know what I like,” and the same is really quite true when it comes to requirements. In any case, requirements should be captured, and you can generally tell the good from the bad by a few key criteria:

  • Requirements should generally include very few technical details about how a problem will be solved; they should contain only the definition of the problem and success criteria. For example, a good requirements document might say “the clerks have to do all of their adding in their heads, and this is slow and error prone. For project success, we would prefer the math done in a manner that avoids error.” A poor requirements document would exchange the last phrase for “. . . we would prefer the math be done using a calculator.” A calculator might be the solution, but the decision should be left to the technologist.
  • The language used should be a specific as possible. As an example, consider a statement like “we only pay new-hire DBAs $20,000 a year, and the first raise is after six months.” If this was the actual requirement, the company could never hire a qualified DBA—ever. And if you implemented this requirement in the software as is, the first time the company wanted to break the rule (like if Paul Nielsen became available), that user would curse your name, hire Paul as a CEO, and after six months, change his designation to DBA. (Users will find a way.) If the requirement was written specifically enough, it would have said “We usually only. . .”, which is implemented much differently.
  • Requirements should be easily read and validated by customers. Pure and simple, use language the users can understand, not technical jargon that they just gloss over so they don’t realize that you were wrong until their software fails to meet their needs.

For my mind, it really doesn’t matter how you document requirements, just as long as they get written down. Write them down. Write them down. Hopefully, if you forget the rest of what I said in this chapter, you’ll remember that. If you are married or have parents, you have probably made the mistake of saying, “Yes ______, I promise I will get that done for you” and then promptly forgetting what was said exactly so an argument eventually occurs. “Yes, you did say that you wanted the screen blue!” you say to your customers. At this point, you have just called them liars or stupid, and that is not a great business practice. On the other hand, if you forward the document in which they agreed to color the screen blue, taking responsibility for their mistake is in their court.

Finally, how will we use written requirements in the rest of the software creation process? In the design phase, requirements are your guide to how to mold your software. The technical bits are yours (or corporate standards) to determine: two tables or three, stored procedures or ad hoc access, C# or VB? But the final output should be verifiable by comparing the design to the requirements. And when it is time to do the overall system tests, you will use the requirements as the target for success.

In this chapter, I will cover two particular parts of the requirements gathering process:

  • Documenting requirements: I’ll briefly introduce the types of concerns you’ll have throughout the project process in terms of documenting requirements.
  • Looking for requirements: Here, I’ll talk about the places to find information and some techniques for mining that information.

Requirements are not a trivial part of a project, and most certainly should not be omitted, but like anything, they can be overdone. This chapter will give you a bit of advice on where to look, or if you are in the happy programmer position of not being the one gathering requirements, what to make sure has been looked at. The sad reality of programming is that the system you create stinks because the requirements that you are given stink, it won’t be the requirements gatherer who gets to recode.

Documenting Requirements

If you’ve ever traveled to a place where no one speaks the same language as you, you know the feeling of being isolated based solely on communication. Everything everyone says sounds weird to you, and no matter how often you ask where the bathroom is, all you get is this blank look back. It has nothing to do with intelligence; it’s because you aren’t speaking the same language. A word of advice: you can’t expect the entire population of another country to learn your language perfectly just so you can get what you need. It works better if you learn their language. Even when two people speak the same basic language, often there can be dialects and phrasing that can be confusing.

Information technology professionals and our clients tend to have these sorts of communication issues, because frequently, we technology types don’t speak the same dialect or even the same language as our clients. Clients tend to think in the language of their industry, and we tend to think in terms of computer solutions. You probably have the same feelings when you are the user as they do. For example, think about SQL Server’s tools. We relational programmers have trouble communicating to the tool designers what we want in SQL Server’s tools. They do an adequate job for most tasks, but clearly, they aren’t completely on the same page as the users.

During the process of analysis, you should adopt one habit early on: document, document, document as much of the information that you acquire as reasonably possible. It’s horrible to think about, but your coworkers might get hit by a bus tomorrow, and every bit of information in their heads will be rendered useless while they recover. Less morbidly (I guess, depending on how you feel about your coworkers), if a project team member decides to take a month off, someone else will have to take over his or her work (or you just might have to wait a month to make any progress, leading to long, working weekends). So, you should document, document, document; do it during a meeting and/or immediately after it. Without documentation, you will quickly risk forgetting vital details. It’s imperative that you don’t try to keep everything in your head, because even people with the best memories tend to forget the details of a project (especially if they’re hit by that bus I talked about earlier).

The following are a few helpful tips as you begin to take notes on users’ needs:

  • Try to maintain a set of documents that will share system requirement and specification information. Important documents to consider include design-meeting notes, documents describing verbal change requests, and sign-offs on all specifications, such as functional, technical, testing, and so on.
  • Beyond formal documentation, it’s important to keep the members of your design team up to date and fully informed. Develop and maintain a common repository for all the information, and keep it up to date.
  • Note anywhere that you add information that the users haven’t given you or outwardly agreed to.
  • Set the project’s scope early on, and keep it in mind at all times. This will prevent the project from getting too big or diverse to be achievable within a reasonable period of time and within the budget. Hashing out changes that affect the budget, particularly when it will increase the budget, early in the process will avoid future animosity.

Once you document something, a crucial step follows: make sure the client agrees with your version of the documentation. As you go through the entire system design process, the clients will no doubt change their minds on entities, data points, business rules, user interface, colors—just about anything they can—and you have to prepare yourself for this. Whatever the client wants or needs is what you have to endeavor to accomplish. The client is in ultimate control of the project, which unfortunately often means communicating through a third party like a project manager and being flexible enough to run with any proposed changes, whether minor or major. What can be even worse is that you may not get to deal with the client directly at all, but only through a project manager. This setup initially sounds great, because you think the project manager will translate for you and be on the side of quality and correctness, and sometimes this is true. But often, the manager will mistranslate a client desire into something quite odd and then insist that it is the client’s desire. “I need all of the data on one screen” gets translated into “I need all of the data in one table.” Best case is that the manager realizes who the technical people are and who have business needs. If you have a typical job, worst case is what you experience every day.

Of course, it is a reality that clients change their minds, and sometimes, it seems to be more than a daily occurrence. Most frequently, they want more and more features. The common term for this is “scope creep.” The best way to avoid conflict is to make sure you get your client’s approval at regular stages throughout the design process. This is sometimes known as the principle of CYA, which I think has something to do with covering all your bases, though the letters may have a more interesting meaning. If you have no written history, telling your client that they are wrong is translated to “liar!” With documentation it translates to a far nicer “it is OK, I forget stuff too.” Users are humans too!

In addition to talking to the client, it’s important to acquire as many notes, printouts, screen shots, CD-ROMs loaded with spreadsheets, database backups, Word documents, e-mails, handwritten notes, and so on that exist for any current solution to the problem. This data will be useful in the process of discovering data elements, screens, reports, and other elements that you’ll need to design into your applications. Often, you’ll find information in the client’s artifacts that’s invaluable when putting together the data model.

image  Tip  Throughout the process of design and implementation, you’ll no doubt find changes to the original ­requirements. Make sure to continue to update your documentation, because the most wonderfully written and formatted documentation in the world is useless if it’s out of date.

Gathering Requirements

Without the designer having a solid understanding of the requirements, the system will essentially be based on guesses. Imagine your parent or spouse asking you to paint a room. Would that be enough to get started? What color paint? What type of paint? (Did you know you can buy paint that turns a wall into a white board?) There are many, many questions you would immediately ask for such a simple problem. If you ever pay people to paint your house, the painters will go over every detail of the process with you, from paint to trim, making sure they know what you desire. Painting a house takes a lot of skill, but the overall process is so much less complex than even the most simple computer system. However, a lot of people take about the same amount of time gathering requirements for a complex program (and it takes a day or two to paint a house).

It isn’t necessary to gather every requirement about every area of a large system initially; the system can be broken down into portions, often referred to as subject areas. The size of the subject area is based on the needs of the team and development methodology used. For example, the Scrum methodology breaks down everything into (generally speaking) 30-day units for designing, coding, and testing, while something like the waterfall methodology would expect you to design the entire system first and then start coding. If Scrum were your team’s methodology, the subject area might be small, function-oriented subject areas based on a set of user needs. Or you might break things down into larger functional areas, such as an accounts-payable module in an accounting system or a user-management module for a web site. The important thing is that all development methodologies will tell you one thing: design before you code. Sure, there is a good amount of variability when it comes to how much design you need, but you still don’t start typing until you know where you are going.

For gathering requirements, there are many tools and methodologies for documenting processes, business rules, and database structures. The Unified Modeling Language (UML) is one possible choice and there are also several model types in the IDEF family of methods; we will cover the data modeling technique in Chapter 3. I’ll employ the Entity-Relationship (E-R) modeling method IDEF1X to model databases. I won’t be covering any of the other modeling languages for the nondatabase structure parts of the project but will rather be using a simple manual spreadsheet method, which is by far the most common method of documenting requirements—even in medium-sized organizations where spending money on documentation tools can be harder than teaching your cat to make good word choices when playing Scrabble.

Regardless of the tools used to document the requirements, the needs for the database design process are the same. Specifications need to be acquired for the following:

  • Entities and relationships
  • Attributes and domains
  • Business rules that can be enforced in the database
  • Processes that require the use of the database

Without these specifications, you’ll either have to constantly go back to the clients and ask a bunch of questions (which they will sometimes answer three different ways for every two times they are asked, teaching you discernment skills) or start making guesses. Although guessing wrong a few times is a good education in how not to do things, it’s certainly no way to work efficiently (unless you happen to be the Amazing Kreskin and guess right 99.9% of the time, though I am pretty sure it was a trick and he had done his requirements gathering as well).

As a major part of the process of implementing a database system, the data architect’s goal will be to produce a graphical model of the database. (As stated in the previous chapter, I’ll be using IDEF1X–based data model diagrams, but you can use whatever methodology your tool supports, even if it is just a piece of paper and a No. 2 pencil.)

image  Tip   During the early parts of a project, figure out the “what” and “why” first; then you can work on the “how.” Once you know the details of what needs to be built, the process to get it built will be reasonably natural, and you can possibly apply preexisting patterns to the solution.

Vagueness may cause unnecessary discussions, fights, or even lawsuits later in the process. So, make sure your clients understand what you’re going to do for them, and use language that will be clearly understood but that’s specific enough to describe what you learn in the information gathering process.

Throughout the process of discovery, artifacts will be gathered and produced that will be used throughout the process of implementation as reference materials. Artifacts are any kind of documents that will be important to the design, for example, interview notes, e-mails, sample documents, and so on. In this section, I’ll discuss the some of the main types of artifacts or activities that you will need to be very interested in as a database architect:

  • Client interviews
  • Prototypes and existing systems
  • Various other types of documentation

By no means is this an exhaustive list of where to find and acquire documentation; in fact, it’s far from it. The goal is simply to get your mind clicking and thinking of information to get from the client so your job will be easier.

Interviewing Clients

It might be the case that the person designing the data storage (commonly referred as the data architect) will never meet the user, let alone be involved in formal interviews. The project manager, business analyst, and/or system architect might provide all the required information. Other projects might involve only a data architect or a single person wearing more hats than the entire Fourth Army on maneuvers. I’ve done it both ways: I’ve been in the early design sessions, and I’ve worked from documentation. The better the people you work with, the more favorable the latter option is. In this section, I’ll talk quickly about the basics of client interviews, because on almost any project, you’ll end up doing some amount of interviewing the client.

Client interviews are commonly where the project really gets started. It’s where the free, unstructured flow of information starts. However, it’s also where the communication gap starts. Many clients generally think visually—in terms of forms, web pages, and perhaps in simple user interfaces. Users also tend to think solely from their own perspective. For example, they may use the word “error” to denote why a process did not run as they expected. These error conditions may not only be actual errors but choices the user makes. So a value like “scheduled maintenance” might be classified as an error condition. It is very much up to the people with “analyst” embroidered on the back of their hats to analyze what users are actually asking for.

As such, the job is to balance the customers’ perceived wants and needs with their real need: a properly structured database that sits nicely behind a user interface and captures what they are really after, specifically information to make their business lives easier and more lucrative. Changing a form around to include a new text box, label, or whatever is a relatively simple task, giving the user the false impression that creating the entire application is an easy process. If you want proof, make the foolish mistake of demonstrating a polished-looking prototype application with non-hard-coded values that makes the client think it actually works. The clients might be impressed that you’ve put together something so quickly and expect you to be nearly done. Rarely will they understand that what exists under the hood—namely, the database and the middle-tier business objects—is where all the main work takes place.

image  Tip   While visual elements are great places to find a clue to what data a user will want, as you go along in the process, you’ll want to be careful not to center your database design too heavily around a particular interface. The structure of the data needs to be dictated by what the data means, not on how it will be presented. Presentation is more of an interface design task, not a database design one.

Brainstorming sessions with users can also yield great results for gathering a lot of information at once, as long as the group doesn’t grow too large (if your meeting requires an onsite caterer for lunch, you are not going to make any great decisions). The key here is to make sure that someone is facilitating the meeting and preventing the “alpha” person from beating up on the others and giving only his or her own opinion (it is even worse if you are that alpha person!). Treat information from every person interviewed as important, because each person will likely have a different, yet valuable viewpoint. Sometimes (OK, usually) the best information comes not from the executive, but from the person who does the work. Don’t assume that the first person speaks for the rest, even if they’re all working on the same project or if this individual is the manager (or even president or owner of a major corporation, though a great amount of tact is required sometimes to walk that tightrope).

In many cases, when the dominant person cannot be controlled or the mousey person cannot be prodded into getting involved, one-on-one sessions should probably be employed to allow all clients to speak their minds, without untimely interruptions from stronger-willed (though sometimes not stronger-minded) colleagues. Be mindful of the fact that the loudest and boldest people might not have the best ideas and that the quiet person who sits at the back and says nothing might have the key to the entire project. Make sure to at least consider everybody’s opinions.

This part of the book is written with the most humility, because I’ve made more mistakes in this part of the design process than any other. The client interview is one of the most difficult parts of the process that I’ve encountered. It might not seem a suitable topic for experienced analysts, but even the best of us need to be reminded that jumping the gun, bullying the clients, telling them what they want before they tell you, and even failing to manage the user’s expectations can lead to the ruin of even a well-developed system. If you have a shaky foundation, the final product will likely be shaky as well.

Asking the Right Questions

When painting a house, there are a set of questions that the painting company’s representative will ask every single one of their clients (colors to use? rooms to paint?). The same can go for almost any computer software project. In the following sections are some questions that are going to be important to the database design aspects of a system’s development. Clearly, this is not going to be an exhaustive list, but it’s certainly enough to get you started, so at a minimum, you won’t have to sit in a room one day with no idea about what to say.

What Data Is Needed?

If the data architect is part of the design session, some data is clearly needed for the system. Most users, at a high level, know what data they want to see out of the system. For example, if they’re in accounting, they want to see dollars and cents summarized by such-and-such a group. It will be very important at some time in your process to differentiate between what data is needed and what would just be nice to have. Whenever you start to do your design/implementation project plans (an activity that is 30% less fun than going to the dentist for a cleaning, and 10% more pleasant than a root canal), you can focus on the necessities and keep the wants in your pocket for later.

How Will the Data Be Used?

Knowing what your client is planning to use the data in the system for is an important piece of information indeed. Not only will you understand the processes that you will be trying to model, but you can also begin to get a good picture of the type of data that needs to be stored.

For example, imagine you’re asked to create a database of contacts for a dental office. You might want to know the following:

  • Will the contact names be used just to make phone calls, like a quick phone book?
  • Will the client be sending e-mail or posting to the members of the contact lists? Should the names be subdivided into groups for this purpose?
  • Will the client be using the names to solicit a response from the mail, such as appointment reminders?
  • Is it important to have family members documented? Do they want to send cards to the person on important dates?

Usage probably seems like it would be out of bounds early in the design process, and in some ways, you would be right. But in broad strokes, usage information is definitely useful. Information about the types of processes where data might be used is important, but what screen it might show up on is less so. For example, take addresses. If you just capture them for infrequent usage, you might only need to give the user a single string to input an entire address. But if your business is mailing, you may need to format it to your post office’s exact specifications, so you don’t have to pay the same postage rates as the normal human beings.

What Rules Govern the Use of the Data?

Almost every piece of data you are going to want to store will have rules that govern how it is stored, used, and accessed. These rules will provide a lot of guidance to the model that you will produce. As an example, taking our previous example of contacts, you might discover the following:

  • Every contact must have a valid e-mail address.
  • Every contact must have a valid street address.
  • The client checks every e-mail address using a mail routine, and the contact isn’t a valid contact until this routine has been successfully executed.
  • Contacts must be subdivided by the type of issues they have.
  • Only certain users can access the e-mail addresses of the contacts.

It’s important to be careful with the verbiage of the rules gathered early in the process. Many times, the kinds of rules you get seem pretty straightforward when they are written down, but the reality is quite often not so simple. It is really important as you are reviewing rules to confirm them with the analyst and likely directly with the client before assuming them to be true.

As a case in point, what is a “valid” e-mail address? Well, it’s the e-mail address that accurately goes with the contact. Sure, but how on Earth do you validate that? The fact is that in many systems you don’t. Usually, this is implemented to mean that the string meets the formatting for an e-mail address, in that it has an ampersand character between other characters and a dot (.) between one or more alphanumeric values (such as %@%.%, plus all characters between A and Z, 0 and 9, an underscore, and so on), but the value is completely up to interpretation. On the other hand, in other types of systems, you actually require the user to pick up some information from the e-mail to validate that it is, indeed, a working e-mail address and that the person who entered the data has rights to it. It is very much up to the needs of the system, but the English question can easily be put using the exact same words.

The real problem comes when you too-strictly interpret rules and your final product ends up unacceptable because you’ve placed an overly restrictive rule on the data that the client doesn’t want or you’ve missed a rule that the client truly needs. I made this mistake in a big way once, which torpedoed a system for several weeks early in its life. Rules that the clients had seemingly wanted to be strictly enforced needed to be occasionally overridden on a case-by-case basis, based on their clients’ desires. Unfortunately, our program didn’t make it possible for the user to override these rules, so teeth were gnashed and sleep was lost fixing the problem (even worse, it was a fixed-bid contract where these kinds of overages meant no more money, completely eating away any possible bonus. And no, I didn’t make that deal).

Some rules might have another problem: the client wants the rule, but implementing it isn’t possible or practical. For example, the client might request that all registered visitors of a web site have to insert a valid mobile phone number, but is it certain that visitors would provide this data? And what exactly is a valid mobile number? Can you validate that by format alone, or does the number have to be validated by calling it or checking with the phone company? What if users provide a landline instead? Implementability is of no concern at this point in the process. Someone will have to enforce the rule, and that will be ironed out later in the process.

What Data Is Reported On?

Reports are often one of the most frequently forgotten parts of the design process, yet in reality, they are almost certainly the most important part of the project to the client. Usually, the thing that makes a system profitable is the ability to report on all of the activity in the system. How productive different parts of the organization are, how effective sales persons are, and so on are a large part of the “why” that make computer systems worthwhile for even very small companies.

Many novice developers leave designing and implementing reports until the last minute (a mistake I’ve made more than once over the years). For the user, reports are where data becomes information and are used as the basis of vital decision making and can make or break a company.

Looking back at the contact example, what name does the client want to see on the reports? The following items come to mind:

  • First name, last name
  • First name, middle name, last name
  • Last name, first name
  • Nickname

It’s important to try to nail down such issues early, no matter how small or silly they seem to you at this point. They’re important to the client, who you should always remember is paying the bill. And frankly, the most important rule for reporting is that you cannot report on data that you do not capture.

From a database design standpoint, the content of reports is extremely important, because it will likely help to discover data requirements that aren’t otherwise thought of. Avoid being concerned with the ascetics of the reports yet, because that might lead to the temptation of coding and away from modeling.

image  Tip   Don’t overlook any existing reports that might have a corresponding report in the new system. Rarely should you just duplicate old reports, but the content will likely include data that the client may never even think about when they’re expressing needs. There will often be hundreds of reports currently in production, and in the new system, there is little doubt that the number will go up, unless many of the reports can be consolidated.

Where Is the Data Now?

It is nice once in a while to have the opportunity to create a totally new database with absolutely no preexisting data. This makes life so easy and your job a lot of fun. Unfortunately, as years pass, finding a completely new system to implement gets less likely than the Cubs winning the World Series (no offense to Cubs fans, but hey, it is what it is). The only likely exception is when building a product to be sold to end users in a turnkey fashion (then the preexisting data is their problem, or yours if you purchase their system). For almost every system I have worked on, I was creating a better version of some other system, so we had to consider converting existing data that’s important to the end users. (Only one major system was a brand new system. That was a wonderful experience for many reasons; not only didn’t we have to deal with data conversion but we didn’t have to deal with existing processes and code either.)

Every organization is different. Some have data in one centralized location, while others have it scattered in many (many) locations. Rarely, if ever, is the data already in one well-structured database that you can easily access. If that were the case, why would the client come to you at all? Clients typically have data in the following sundry locations:

  • Mainframe or legacy servers: Millions of lines of active COBOL still run many corporations.
  • Spreadsheets: Spreadsheets are wonderful tools to view, slice, and dice data but are wildly inappropriate places to maintain complex data. Most users know how to use a spreadsheet as a database but, unfortunately, are not so experienced in ensuring the integrity of their data, so this data is undoubtedly going to give you a major headache.
  • Desktop databases such as Microsoft Access: Desktop databases are great tools and are easy to deploy and use. However, this ease of use often means that these databases are constructed and maintained by nontechnical personnel and are poorly designed, potentially causing many problems when the databases have to be enlarged or modified.
  • Filing cabinets: Even now, in the twenty-first century, many companies still have few or no computers used for anything other than playing solitaire and instead maintain stockpiles of paper documents. Your project might simply be to replace a filing cabinet with a computer-based system or to supply a simple database that logs the physical locations of the existing paper documents.

Data that you need to include in the SQL Server database you’re designing will come from these and other weird and wonderful sources that you discover from the client (truth is commonly stranger than fiction). Even worse, spreadsheets, filing cabinets, and poorly designed computerized databases don’t enforce data integrity (and often desktop databases, mainframe applications, and even existing SQL Server databases don’t necessarily do such a perfect job either), so always be prepared for dirty data that will have to be cleaned up before storage in your nice new database.

Will the Data Need to Be Integrated with Other Systems?

Once you have a good idea of where the client’s important data is located, you can begin to determine how the data in your new SQL Server solution will interact with the data that will stay in its original format. This might include building intricate gateway connections to mainframes, linking server connections to other SQL Servers or Oracle boxes, or even linking to spreadsheets. You can’t make too many assumptions about this topic at this point in your design. Just knowing the architecture you’ll need to deal with can be helpful later in the process.

image  Tip   Never expect that the data you will be converting or integrating with is going to have any quality. Too many projects get their start with poor guesses about the effort required, and data cleanup has been the least well-guessed part of them all. It will be hard enough to understand what is in a database to start with, but if the data is bad, it will make your job orders of magnitude more difficult. If you have promised to do the work for $1,000 and it ends up taking 500 hours, you would have been better off with a spatula working midnights near the Eiffel Tower . . . in Paris, Tennessee.

How Much Is This Data Worth?

It’s also important to place value judgments on data. In some cases, data will have great value in the monetary sense. For example, in the dental office example that will be presented later in Chapter 4, the value lies in the record of what has been done to the patient and how much has been billed to the patient and his or her insurance company. Without this documentation, digging out this data to eventually get paid for the work done might take hours and days. This data has a specific monetary value, because the quicker the payment is received, the more interest is drawn, meaning more profits. If the client shifts the turnover of payments from one month to one week because of streamlining the process, this might be worth quite a bit more money.

On the other hand, just because existing data is available doesn’t necessarily mean that it should be included in the new database. The client needs to be informed of all the data that’s available and should be provided with a cost estimate of transferring it into the new database. The cost of transferring legacy data can be high, and the client should be offered the opportunity to make decisions that might conserve funds for more important purposes.

Who Will Use the Data?

Who is going to use the data probably doesn’t instantly jump out at you as a type of data that needs to be considered during the early stages of requirement gathering. When designing an interface, usually who is going to actually be pushing the button probably doesn’t make a lot of difference to the button design (unless disabilities are involved in the equation perhaps). Yet, the answer to the question of “who” can start to answer several different types of questions:

  • Security: “Who will use the data?” can be taken two ways. First, these are the only people who care about the data. Second, these are the only people who are privileged to use the data. The latter will require you to create boundaries to utilization. For fun, add in privacy laws like the Health Insurance Portability and Accountability Act (HIPAA) or Sarbanes–Oxley Act (SOX) or any of 100 other well-meaning laws that punish the dba more than the offenders.
  • Structure: If multiple user groups need the same data, but for particularly different needs, this could lead to different possible design outcomes later in the process.
  • Concurrency: The design considerations for a system that has one user are different from ten, or a hundred, thousand, and so on. The number of users should not change our conceptual or logical designs, but it certainly can effect how we design the physical hardware layer. Concurrency is something we won’t make a lot of reference to until very late in this book (Chapter 10, but this is the point in time when you are doing the asking (and likely purchasing the hardware, so it doesn’t hurt to find out now).

This choice of who will use the data goes hand in hand with all of the other questions you have gotten answered during the process of gathering requirements. Of course, these questions are just the start of the information gathering process, but there is still a lot more work to go before you can start building a database, so you are going to have to cool your jets a bit longer.

Working with Existing Systems and Prototypes

If you’re writing a new version of a current database system, access to the existing system is going to be a blessing and a curse. Obviously, the more information you can gather about how any previous system and its data was previously structured, the better. All the screens, data models, object models, user documents, and so on are important to the design process.

However, unless you’re simply making revisions to an existing system, often the old database system is reasonable only as a reference point for completeness, not as an initial blueprint. On most occasions, the existing system you’ll be replacing will have many problems that need to be fixed, not emulated. If the system being replaced had no problems, why is the client replacing it? Possibly just to more to newer technology but no one replaces a working system just for kicks.

Prototypes from the early design process might also exist. Prototypes can be useful tools to communicate how to solve a real-world problem using a computer or when you’re trying to reengineer how a current process is managed. Their role is to be a proof of concept—an opportunity to flesh out with the design team and the end users the critical elements of the project on which success or failure will depend.

The real problem with prototypes is that if a database was created for the prototype, it is rarely going to be worth anything. So, by the time database design starts, you might be directed to take a prototype database that has been hastily developed and “make it work” or, worse yet, “polish it up.” Indeed, you might inherit an unstructured, unorganized prototype, and your task will be to turn it into a production database in no time flat (loosely translated, that means to have it done early yesterday).

It may be up to you, at times, to remind customers to consider prototypes only as interactive pictures to get the customer to try out a concept, often to get your company to sign a contract. Sometimes, you might be hired to implement the prototype (or the failed try at a production system that’s now being called a prototype) that another consultant was hired to create (or worse yet, an employee who still works there and has a chip on his or her shoulder the size of a large African elephant that spends all day eating bonbons and watching soap operas). It’s better to start from scratch, developing the final application using structured, supported design and coding standards. As a data architect, you must work as hard as possible to use prototype code only as a working model—a piece of documentation that you use to enhance your own design. Prototypes help you to be sure you’re not going to miss out on any critical pieces of information that the users need—such as a name field, a search operation, or even a button (which might imply a data element)—but they may not tell you anything about the eventual database design at all.

Utilizing Other Types of Documentation

Apart from interviews and existing systems, you can look to other sources to find data rules and other pieces of information relevant to the design project. Often, the project manager will obtain these documents; sometimes, they will not be available to you, and you just have to take someone else’s word for what is in them. In these cases, I find it best to get into writing your understanding and make sure it is clear who said what about the meaning of documentation you cannot see. Ordinarily, I try not to worry about the later blame factor, but it is essential to worry when you are working from a translation that may later come under dispute.

And as always, the following list is certainly not exclusive but should kick start your thinking about where to get existing documentation for a system you are creating or replacing.

Early Project Documentation

If you work for a company that is creating software for other companies, you’ll find that early in the project there are often documents that get created to solicit costs and possible solutions, for example:

  • Request for quote (RFQ): A document with a fairly mature specification that an organization sends out to determine how much a solution would cost
  • Request for proposal (RFP): For less mature ideas for which an organization wants to see potential solutions and get an idea about its costs

Each of these documents contains valuable information that can help you design a solution, because you can get an idea of what the client wanted before you got involved. Things change, of course, and not always will the final solution resemble the original request, but a copy of an RFP or an RFQ should be added to the pile of information that you’ll have available later in the process. Although these documents generally consist of sketchy information about the problem and the desired solution, you can use them to confirm the original reason for wanting the database system and for getting a firmer handle on what types of data are to be stored within it.

No matter what, if you can get a copy of these documents, you’ll be able to see the client’s thought pattern and why the client wants a system developed.

Contracts or Client Work Orders

Getting copies of the contract can seem like a fairly radical approach to gathering design information, depending on the type of organization you’re with. Frankly, in a corporate structure, you’ll likely have to fight through layers of management to make them understand why you need to see the contract at all. Contracts can be inherently difficult to read because of the language they’re written in (sort of like a terse version of a programming language, with intentional vagueness tossed in to give lawyers something to dispute with one another later). However, be diligent in filtering out the legalese, and you’ll uncover what amounts to a basic set of requirements for the system—often the requirements that you must fulfill exactly or not get paid. Even more fun is the stuff you may learn that has been promised that the implementation team has never heard of.

What makes the contract so attractive is simple. It is, generally speaking, the target you’ll be shooting at. No matter what the client says, or what the existing system was, if the contract specifies that you deliver some sort of watercraft and you deliver a Formula 1 race car because the lower-level clients change their minds without changing the contract, you might not get paid because your project is deemed a failure (figuratively speaking, of course, since who doesn’t like Formula 1 race cars?).

Level of Service Agreement

One important section of contracts that’s important to the design process is the required level of service. This might specify the number of pages per minute, the number of rows in the database, and so on. All this needs to be measured, stored, tested for, and so on. When it comes to the testing and optimization phases, knowing the target level of service can be of great value. You may also find some data that needs to be stored to validate that a service level is being met.

Audit Plans

Don’t forget about audits. When you build a system, you must consider whether the system is likely to be audited in the future and by whom. Government, ISO 9000 clients, and other clients that are monitored by standards organizations are likely to have strict audit requirements. Other clients may also have financial audit processes. Of particular concern are all the various privacy policies, child data restrictions, credit card encryption rules, and so on. All of these will not only require that you follow rules that regulatory bodies set but that you document certain parts of your operation. These audit plans might contain valuable information that can be used in the design process.

Following Best Practices

The following list of some best practices can be useful to follow when dealing with and gathering requirements:

  • Be diligent: Look through everything to make sure that what’s being said makes sense. Be certain to understand as many of the business rules that bind the system as possible before moving on to the next step. Mistakes made early in the process can mushroom later in the process.
  • Document: The format of the documentation isn’t really all that important, only that you get documented as much of what the client wants as possible. Make sure that the documentation is understandable by all parties involved and that it will be useful going forward toward implementation.
  • Communicate: Constant communication with clients is essential to keep the design on track. The danger is that if you start to get the wrong idea of what the client needs, every decision past that point might be wrong. Get as much face time with the client as possible.

image  Note   The mantra “review with client, review with client, review with client” is probably starting to get a bit old at this point. This is one of the last times I’ll mention it in this chapter at least, but it’s so important that I hope it has sunk in.

Summary

In this chapter, I’ve introduced some of the basics of documentation and requirements gathering. This is one of the most important parts of the process of creating software, because it’s the foundation of everything that follows. If the foundation is solid, the rest of the process has a chance. If the foundation is shoddy, the rest of the system that gets built will likely be the same. The purpose of this process is to acquire as much information as possible about what the clients want out of their system. As a data architect, this information might be something that’s delivered to you, or at least most of it. Either way, the goal is to understand the user’s needs.

Once you have as much documentation as possible from the users, the real work begins. Through all this documentation, the goal is to prepare you for the next step of producing a data model that will document in a very formal manner of the following:

  • Entities and relationships
  • Attributes and domains
  • Business rules that can be enforced in the database
  • Processes that require the use of the database

From this, a conceptual data model will emerge that has many of the characteristics that will exist in the actual implemented database. In the upcoming chapters, the database design will certainly change from this conceptual model, but it will share many of the same characteristics.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.144.194