Chapter 3. Continuous integration

This chapter covers

  • Setting up continuous integration
  • Closing the feedback loop
  • Updating the check-in dance

In the previous chapter, we set up a version control repository to enable all developers to have the latest version of the code. Now we’re going to tackle a problem common in all software development projects: integration.

Large software projects are often split into separate components as part of the design, and it isn’t until late in the project that the components are integrated into a single cohesive solution. Often, this integration is a massive undertaking in and of itself and may even take several weeks and require changes to existing code in order to make it work.

When you integrate your code regularly as an ongoing part of your process, you discover problems and incompatibilities—pollution—much earlier. This technique, called continuous integration (CI), results in more streamlined and automated builds, especially when it’s part of the process from the beginning.

Brownfield apps often suffer from many of the problems CI seeks to prevent. Fortunately, it’s possible to implement CI when you take on a brownfield project and quickly reap the same benefits you’d see with a greenfield project.

In this chapter, we’ll introduce the core techniques of CI and show you how to apply them to a brownfield codebase. If you’re not familiar with CI, you’ll also discover how to apply it to new or early-stage projects.

Before we go into details on the process, let’s look at some other pain points.

3.1. Pain points

We’ve already discussed one of the common pain points you can address through implementing continuous integration. Integrating a project’s components at the last minute can be an arduous process. This last-minute integration can be particularly prevalent in brownfield applications because of the way the code weaves together in these projects. It’s common to find a small number of large classes in some areas so that it’s unavoidable for developers to overlap while they’re working, as with a single data access class that has static methods providing all access to the database.

Just consider a team of developers continuously checking in code for months and months. What are the odds that one developer’s changes will cause another’s to fail? Ninety percent? One hundred percent?

Let’s say we’re working on an application and one developer, Huey, is working on a screen that requires a grid. As a diligent developer, Huey knows the company has a license for a third-party grid and incorporates that into the application. He installs the grid, adds a reference to it from the project, and checks in the code.

Now consider another developer, Lewis. Lewis comes in the next morning and retrieves the latest version of the code. And because Lewis doesn’t have the grid installed, the code won’t compile. All of a sudden, Lewis needs to do some investigative work before he can even start on his day’s work. See Figure 3.1.

Figure 3.1. Adding a component to version control that requires developers to install something first is a source of friction.

Here’s another scenario. Consider this class:

public class Person

{
public string FirstName { get; set; }
public string LastName { get; set; }
public DateTime Birthdate { get; set; }
}

In this scenario, let’s say Michael wants to refactor the class so that the LastName property is called Surname as the team attempts to better match how the business thinks of the object. He changes the property and diligently updates all references to it in the rest of the code.

Now assume another developer, Jackson, has been working (not quite as diligently as Michael) on a screen to edit the Person class. He finishes the screen and checks in the code, but forgets to get the latest version before doing so. As such, his code uses the now-defunct LastName property and is incompatible with Michael’s recent changes.

The next person to retrieve the latest version of the code will have a problem. The application no longer compiles. So she’ll have to wear the detective hat as she attempts to find out why the application won’t compile.

These are only a couple of the problems that CI can solve. Now let’s take a look at what CI is and how it can mitigate these problems.

3.2. What is continuous integration?

In Martin Fowler’s seminal article on the topic,[1] he defines CI as follows:

1http://martinfowler.com/articles/continuousIntegration.html

Continuous Integration is a software development practice where members of a team integrate their work frequently, usually each person integrates at least daily—leading to multiple integrations per day. Each integration is verified by an automated build (including test) to detect integration errors as quickly as possible. Many teams find that this approach leads to significantly reduced integration problems and allows a team to develop cohesive software more rapidly.

If you aren’t familiar with the practice of CI, this can seem like a lofty goal. The key to the process is the automated build, whereby the application can be built in its entirety automatically and at any time. Once an automated build can be achieved, CI becomes a process that can be scheduled, launched manually, or launched based on some external event (a common one is to launch the CI process each time code is checked into the source code repository).


Culture shift

As you read this section, bear in mind that the challenges behind CI are more cultural than technical. Incorporating CI into your process may require some adjustments in your team’s mentality. Few people like to be reminded constantly when they’ve made mistakes. Early in a brownfield project, you may encounter resistance as the CI process reports issues on a regular and frequent basis, because developers aren’t used to having their work double-checked on every check-in.

Countering this resistance is a question of changing your mindset. The point of CI isn’t to identify more problems. All it does is identify existing problems earlier, before they reach the testing team or the client. CI won’t find all of them, of course, but it does add an important safety net. In our experience, you’ll find that safety net adds to the confidence level of your development team and keeps them focused on code rather than process.


The ultimate reason for CI is to find problems quickly. We call this “closing the feedback loop.”

3.2.1. The feedback loop

It’s inevitable that problems will arise in your code. Whether it’s a bug or a configuration issue or an environmental problem, you won’t get everything right the first time. With many brownfield projects, in just glancing over the defect list you may find a lot of issues that could have been fixed if the team had had a dedicated process to integrate all the code together on a regular basis before releasing to the testing team. Furthermore, when combined with automated tests and code metrics (see chapters 4 and 5), CI becomes a powerful tool in your toolkit.

Most of the time, these problems are compilation errors and are discovered by the development team. If your code doesn’t compile, that’s a problem. But you get feedback on it almost instantly courtesy of Visual Studio. The IDE tells you right away that your code has something wrong with it, and you fix it.

Imagine that you built your application, compiled it, and released it to the customer without ever running it. It would almost certainly fail and you wouldn’t have a happy client. Generally speaking, having your client discover errors is a bad thing.

To guard against this issue, most developers will run the application locally before handing it off. They’ll open it, run through some test cases to see if it works, and then pass it on when they deem it working. If any major issues pop up during their quick tests, they think “Phew! Glad we discovered them early.”

3.2.2. Closing the feedback loop

That’s the basic idea behind the feedback loop. It’s the amount of time between when a problem is created and when it’s discovered. In the worst-case scenario, the client discovers it. In the best, it never leaves the development stage. Many brownfield projects suffer from a wide feedback loop. Given that problems will occur, we’d like to discover them as soon as possible; we want to close the feedback loop. If we don’t, the problem will only grow as it progresses to the client, as shown in figure 3.2.

Figure 3.2. As a problem or bug makes its way through each stage in a project’s life cycle, the effort and cost required to resolve the bug increase by an order of magnitude.

Closing the feedback loop is essentially why we have testing teams in place. They act as a buffer between the development team and the client. One of their primary objectives is to discover problems before they reach the client.

CI is, in effect, a way to close the feedback loop for the project team. Instead of being informed about bugs days, weeks, or months after they were created, from either the testing team or the client, feedback surfaces within minutes. Because CI becomes a way of life on a project, problems that arise are dealt with immediately.

As we said earlier, one of the key aspects of CI is automating your build process. We’ll look at that process next.

3.3. Automated builds and the build server

Automation is one of the cornerstones of CI (and indeed, our entire industry). Without automation, CI is almost impossible. You should strive to automate as much of the build process as possible, from compilation, to testing, to releases.

Theoretically, automation could be accomplished with batch files. You could create one to retrieve the latest version of the application from version control and compile the application. This batch file could then be scheduled to run, say, every 10 minutes. After each run, it could send an email detailing the results of the test.


Note

Remember in chapter 2 when we went to the trouble of setting up your folder structure so that it was self-contained? We’ll let you in on a little secret: that wasn’t just for the benefit of your new developers. It also makes your CI process run that much more smoothly.


A batch file satisfies the definition of CI, if not the spirit. Luckily, there are a number of tools available to make this automation easier than creating batch scripts manually. (The appendix lists some of the tools available to assist with automating your build.)


Warning

The CI process doesn’t replace your QA department. Never forget that compiling code only means the application passes a few syntax rules. Similarly, passing tests means only that the code is meeting the expectations of the tests as they were written. These are not the same thing as doing what the users expect. Quality assurance is still required for the big picture, to make sure everything works well as a whole.


The CI process requires a separate machine to act as the build (or integration) server. This machine resembles the production environment as closely as reasonable, save for any software required to automate the build.

The purpose of the build server is to act as the mainline. The automated build process could run and pass on any number of developers’ machines, but until the build successfully runs on the build server, the application isn’t considered to be in a stable state.

Furthermore, a failed build on your build server is a serious offense. It means that at that exact moment, your application doesn’t work in some fashion. If development were to stop right then, you wouldn’t have a working application.

Don’t let the fact that it’s called a server scare you. A spare developer box in a corner of someone’s office is usually sufficient to act as a build server. The important criteria are that it not be actively used by anyone and that it reasonably resemble an end user’s configuration.

The build server should be free of any special software, especially development IDEs (such as Visual Studio) and third-party installations, if possible. That isn’t a hard-and-fast rule because you may find it necessary to install, for example, database software to store databases that are created and dropped during integration tests.


No server? No problem!

If resources are tight or require an unholy amount of bureaucratic red tape to request, another option is to use virtualization. You use software (such as VMware Workstation or Virtual PC) to set up one or more instances of your environment running within an existing system.

Even if you have the option of unlimited hardware, virtual machines offer benefits as build servers. They’re easily created and readily disposable. Plus, they can be rolled back to a known state quickly. The primary concern with the build server’s performance is that it doesn’t slow down your development efforts. As a result, virtualization is a viable (and in many cases, preferred) alternative to a dedicated computer.


The reason a clean machine is preferred is that development machines can get polluted by the various tools, utilities, and scripts we like to install, uninstall, and reinstall. By using a clean computer as the build server, we can often discover environmental issues like, for example, an application that contains an errant dependency on the Visual Studio SDK.

Before we look at the details of an automated build process in section 3.5, let’s assume we have one in place somehow. How will this affect your check-in dance?

3.4. An updated check-in dance

In chapter 2, we talked about the check-in dance:

1.  

Check out the file(s) you wish to modify.

2.  

Make your changes.

3.  

Ensure the application compiles and runs as expected.

4.  

Get the latest version of the application.

5.  

Ensure again that the application compiles and runs as expected.

6.  

Check in your changes.

After adding a CI process to the application, we need to add one more step:

7.  

Verify that build server executes without error.

The updated dance is shown in figure 3.3.

Figure 3.3. After adding continuous integration, you need to add a step to the end of the check-in dance to allow the CI server to do its work.

The reason for the additional step is that you aren’t truly finished until the build machine is able to compile and test the application. You may have forgotten to add a class or a new third-party library to version control, for example. Such an error will be caught when the build machine attempts to compile the application.

As you’ll recall, this is one of the scenarios from the introduction to this chapter, where there was no CI process and Huey had checked in a new component that required an installation. The problem didn’t manifest itself until the next developer, Lewis, retrieved the latest version of the code. That next developer may be retrieving the latest version as part of his check-in dance, which means he has changes that need to be checked in. Instead of being focused on committing his changes to the version control system, he must deal with tracking down a missing reference before his code can be checked in.

Note that with most automated build software, step 7 of the check-in dance is performed automatically for you by the build server. There’s no need for you to do it manually. The build server would be configured to trigger this step whenever a change is made to the source code repository. But even so, your work is still not done after you complete step 6. Rather, step 7 becomes

7.  

Wait until you receive confirmation from the build server that there are no errors.

You don’t continue coding until you know that your changes have not broken the build. This approach saves you from having to roll back any changes you make while the build is going on. Presumably, your process takes only 2 or 3 minutes at the most, so you usually aren’t sitting idle for long. Plus, this practice is more of a guideline than an actual rule. In reality, most developers don’t like to stop their rhythm after checking in code. In the beginning, when you are new to CI, this is a good guideline to adhere to. As you gain more confidence and check in more often, don’t be too draconian about enforcing this rule.

3.4.1. Breaking the build

If your changes have broken the build, you have some work to do because there are three important rules to follow when there’s a broken build:

  • Nobody shall check in code when the build is broken, unless it is specifically to fix the build.
  • Nobody shall retrieve the latest version of the code when the build is broken.
  • He who broke it, fixes it.

A broken build is a serious issue, at least on the build server. It means that the application won’t compile, has failing tests, or is otherwise in a state that’s unstable. As such, there’s no sense in retrieving code that doesn’t work or checking in unrelated code that could exacerbate the problem. The final rule is a matter of convenience. The person who checked in has the most knowledge about the problem and is in the best position to get the team back on track. He or she is allowed to enlist help, but ultimately the person whose check-in caused the problem should spearhead the effort to fix it.

There’s one final note on steps 1 and 2 of the check-in dance, now that we’re implementing a CI process: local developer compilation and testing of the application should be done in the same way it’s done on the build server. That probably means performing these steps outside the Visual Studio IDE via some form of batch script. As mentioned in section 3.3, the build server should be free of IDEs if at all possible so it won’t be using Visual Studio to run the build. Besides, compiling the application in Visual Studio doesn’t lend itself well to automation.

Instead, if your build process involves executing a build script to compile the application, you should do the same to locally compile the application as part of the check-in dance. If you use a batch file on the build server to run your tests, use the same one on your local machine. If you use an automation tool, such as NAnt, Rake, or pSake, use the same tool locally before checking in.

The reason should be obvious: if you use a different process than the build server to compile or test your application, you run the risk of errors occurring on the build server that don’t occur on your local machine. These errors could be caused by differences in configuration, which files are included in the compilation process, which assemblies are tested, and any number of other things.


Challenge your assumptions: Don’t build in the IDE

There are a number of reasons that you may not want to build your application in your IDE. First, consider that Visual Studio performs a number of visual tasks to report the progress of the compilation. Each of those updates takes time, and when you’re working with a large codebase, or with a large number of projects in the solution, that time adds up. Compiling at the command line provides only a bare minimum of updates to the developer. As a result, compilation may be faster.

Also consider that you want to build your application locally in the exact same way as it will be built on the CI server and for release. If your CI server is building the application using a build script, IDE-based compilation can’t provide the same level of confidence that all is well.

Make the Alt-Tab keystroke out of Visual Studio and into the command-line window. The speed increase in your build process and the increased confidence in your local codebase may surprise you.


With that in mind, running the automated build locally is necessary only when you’re doing the check-in dance. However you and your fellow developers may wish to compile the application and run the tests during the normal development process, that’s entirely left to personal preference.

With all of this information now at hand, let’s revisit one of our pain point scenarios from section 3.1.

3.4.2. Removing the pain: a check-in walkthrough

Recall the changes that Michael and Jackson made in our scenario from section 3.1. Michael had changed the LastName property to Surname. Jackson added code that used the LastName property but didn’t get the latest version of the code before checking in. He skipped step 2 of the check-in dance and the problem fell to the next developer to discover and possibly fix the error.

Now let’s assume we have an automated build process in place. As soon as someone checks in code, it will do the following:

1.  

Get the latest version of the application

2.  

Compile it

3.  

Notify the team via email of the result

As soon as Jackson checks in his code, our automated build process kicks in and promptly fails the second step: compiling the code. The team gets an email that the code is broken and we’ve even configured the process to let us know who made the last check-in—in this case, Jackson.

Our rules for broken builds now come into play. Michael, Lewis, Huey, and the rest of the team stop what they’re doing. Maybe they make their way to the coffee station for a refresh. More often, they saunter over to Jackson’s cubicle for some good-natured ribbing. Perhaps they bestow upon him some symbol indicating he was the last person who broke the build, for example, a rubber duck (the “fail duck”). Jackson, alert to the lighthearted mockery about to befall him, realizes he forgot to get the latest version and quickly makes the required change. It’s a quick fix and before the first person has set foot in his cubicle, he’s made the change and performed the proper check-in dance. He can now await his fate with a clear conscience.

The point is, amid all this camaraderie, the problem is discovered almost immediately after it was caused. As a result, it is fixed in very little time. That’s the power of CI.

Also, note that despite your best efforts, it is still possible to break the build, even when performing the check-in dance. It’s a matter of timing. During the time between when you get the latest version and the time you check in, someone else may have made changes. See figure 3.4.

Figure 3.4. During the time between when you get the latest version and check in, someone else may have checked in code.


CI and team dynamics

We’ve made light of the social aspects of Jackson’s plight, but they’re an important side benefit of CI. Without CI, Jackson’s error would go undiscovered until someone else needed to get some work done. Maybe that person would sit and grumble about the uselessness of the rest of the team and fix the problem, not even bothering to bring up the incident. Or perhaps she would confront Jackson with a trite “your code is broken” and walk away. Neither is the path to a healthy team relationship.

But when the build is monitored by a nonjudgmental third party (the build server), we’ve found that this often adds some levity to a team’s dynamics and helps them work better together. In past projects we’ve worked on, a failed build is an occasion for team members to get up and walk around and engage in some lighthearted jibes. Assuming it doesn’t happen often and the target doesn’t take it seriously, this camaraderie is a good thing because along with the problem being fixed right away, the team is bonding at the same time.


Although no one is at fault in this case, it results in code that doesn’t work in the version control repository. Again, the sooner you discover this, the sooner it can be fixed and you can return to your work. In practice, this scenario doesn’t happen often, and when it does, usually the changes don’t conflict. Also, many version control systems will recognize and warn you if this situation arises.

One of the major benefits CI and the check-in dance offer is that they get your team into a good rhythm of coding. Edit, build, get, build, commit, integrate...edit, build, get, build, commit, integrate... As you can see, there is rhythm in simply saying the words. We call the rhythm derived from this process “the metronome effect.”

3.4.3. The metronome effect

Musicians with particularly harsh childhood piano teachers may not appreciate the analogy, but it’s helpful to think of the CI process as a metronome. Like a good metronome, it helps set and keep the rhythm of a project.

The rhythm of a project is primarily defined by how often you check in code. If developers check in sporadically and sparingly, such as every week, the project has a slow rhythm. But if each developer checks in regularly (say, three or four times a day), the project moves along at a good tempo.

The combination of a good CI process and an edit-merge-commit version control system encourages the practice of checking in at a regular pace. Since we get near-instantaneous feedback from our CI system on whether the code we’ve just checked in works, we’ve removed an important pain point: the pain of checking out code that doesn’t work and the subsequent investigation of problems introduced by other people. If the automated build is executed on every check-in, any errors in the code will be flagged almost immediately after they are checked in.

This immediate feedback alone drastically improves our confidence in the code and, just as importantly, allows the VCS to fade into the background so that we aren’t afraid to get the latest version of the code. And with a CI process in place, each check-in is a reminder that the project is still in a stable state, further increasing our confidence.

Whereas the CI process allows you to check in more frequently, an edit-merge-commit VCS will encourage it because of the check-in dance. Part of the dance is to retrieve the latest version of the code and merge your changes into it. If there’s a conflict (if you have made changes to the same section of code as another developer since your last check-in), you need to manually examine these changes and decide which ones to keep—not an easy process and one made more difficult if there are many conflicts.

By checking in several times per day, you minimize the chance of these conflicts occurring as well as the impact when a conflict does arise. Check-ins are no longer a monumental event that requires an afternoon to merge your changes with the rest of the codebase.

One of the keys to achieving a regular check-in rhythm is not to wait until a feature has been completed before checking in. That practice is a throwback to the days when the source code repository was viewed as a collection of mini-releases and each feature was considered a unit of work. Developers were of the mindset that there was no benefit to be gained checking in code that half-worked.


What does working code mean?

This outdated view of version control is rooted in the definition of what it means for code to work. Traditionally, your code worked when it included a complete and tangible feature for the client to see—a good perspective for a client, but not for a developer. For you, pieces of code work simply if they do what you want them to do.

As we progress through this book, you’ll find that our definition of working code will change. At the moment, it means that the code compiles because that’s all our CI process is checking for. In the next chapter, we’ll expand on this and include automated testing. So working code will then encompass code that compiles and passes the automated tests. Later, we’ll include software metrics.

Once you’re able to produce working code at will, you can drop the antiquated notion that you have to wait until an entire feature is complete before checking in. That’s what labels are for in your VCS.


Instead of working to less-frequently reached goals, developers are encouraged to check in often. A good rule of thumb is to check in as soon as you’ve completed a slice of code and it’s covered completely by automated tests (more on this in chapter 4).

With a proper CI process in place, an interesting side effect occurs. The application is, at any given moment, releasable. What we mean by releasable is that we’re confident that the application can be put into a testing or production environment and that it will function within it. Although not all features may be complete, we have confidence that the ones that do exist work.

Having the team ticking like a diligent metronome reduces the friction associated with integrating code. Initially, you may be wary of checking in code for fear that you’ll break the build. Although breaking the build in the CI environment isn’t optimal, it’s serving its intended purpose: early identification of issues. We’ve found that once the team gets used to the idea, a check-in becomes almost a nonevent. The frequency of check-ins becomes a part of your physical habit as well as a foundation for your confidence in your work.

Hopefully, we’ve made clear the benefit of incorporating a CI process into your brownfield application. Now it’s time to get back to the practical application of it and examine the steps involved in the CI process.

3.5. Setting up a CI process

There are three general steps to setting up a CI process for an application:

1.  

Compile the application

2.  

Run automated tests

3.  

Create deployable releases

These steps may not seem like magic at first glance. In fact, they correspond to a subset of the steps in the check-in dance discussed in section 3.4. A developer compiles his code, tests it, then “releases it” (checks it into the source code repository). Each step builds incrementally on the others, as shown in figure 3.5.

Figure 3.5. The three basic steps of an automated build. Each of the incremental steps builds and relies on the previous.

The difference is that each of these steps is completely automated as part of the build script. You should have a process that, once completed, can be scheduled at regular intervals or executes based on some external event.

You don’t need to wait for the build server to perform these steps; the process can also be launched manually. Indeed, we encourage you to do so on your local development machine. This approach treats each developer’s machine as its own build server in a way. And it’s helpful to think of them as such because it gets you into the habit of running the build process locally in the same way it’s executed on the build server. In this way, you minimize the chance of problems being elevated into the source code repository.


Note

Running the build script locally, and often, increases the speed of the feedback loop to its maximum. Within seconds of adding or modifying code, each developer can know if they’ve introduced any issues into the application.


Over the course of this section, we’ll present code snippets in NAnt format. NAnt is a popular open source utility used to automate the build process. It’s XML based and started as a port of the Ant build utility for Java. You don’t need to be familiar with the syntax to understand the examples.

A word of caution before we describe these in more detail: when implementing CI into a team that’s not familiar with the process, consider introducing each of these three steps individually rather than all at once. Introducing all the steps at once can be overwhelming and can quickly demoralize a team.

Let’s look at an example of what we mean by incremental introduction of the steps. In this example, the build process is based on an NAnt build file. On the server, we’ve installed TeamCity,[2] an application that allows us to monitor our version control system and launch the NAnt build script whenever someone has checked in code.

2 See http://jetbrains.com for information on how to install and configure TeamCity.

The first thing we do is modify the build file so that it compiles the application. This typically means one developer modifies the build file on her local machine according to section 3.5.1.

When she gets it to a state where it will successfully compile the application, the developer updates the build server with the new build file and verifies that TeamCity can successfully execute the build file and compile the application. After that, each developer updates his or her local copy of the build file and verifies that it executes successfully locally.


Note

This movement of updating the build file from the change initiator to the build server to the rest of the development team needs to be effortless. The best way that we’ve found to push changes to the team and build server is to include the build file in source control. As you’ll see in section 3.6, adding this build component to source control elevates the build script to first class status in the project.


We then repeat the process for step 2 once the team is comfortable with the automated compilation’s correctness. One developer modifies the build file so that it now executes the tests in addition to compiling the application. (See chapter 4 for more on automating tests.) He then updates the build server and the rest of the team updates their local copies of the build file.

Finally, we do the same for step 3. Processes to automate the release of the application are added to the build file and distributed to the team.

Alternatively, if a developer were to modify the build file to implement the steps all at once, there’s a greater chance that the build won’t work completely when it’s moved to the build server. As with any work that we do, we must test and verify the execution of the build script. If too much automation has been included all at once, it can be difficult to track down the changes needed to make the build process work correctly.

Let’s look at each step in the process in more detail.

3.5.1. Compile the application

The following NAnt target will compile your application according to how it is set up in your solution (.sln) file using the Debug configuration:

In this snippet, we use a target , which is NAnt’s way of organizing pieces of work (similar to methods in a class). We use the built-in <msbuild> task to compile the application using MSBuild.

That’s all there is to it. Granted, our choice of tool has helped somewhat, but at its core, compiling the application is a pretty benign event.

Versioning the Application

While you’re setting up the compilation of the different assemblies, it’s advisable to have the process automatically version those files. In .NET, versioning of assemblies is handled through attribute values in the AssemblyInfo.vb or AssemblyInfo.cs files, depending on whether you’re using Visual Basic or C#. Build scripts tools, like NAnt, usually offer a way to automatically generate the AssemblyInfo file with contents that are determined during the build process. The alternative is to read in the file, modify it, and save it again.

Autogenerating an AssemblyInfo file lends itself to having your CI and build processes assign a version number to the assemblies being compiled. Using the capabilities of the build script tool and including the autogenerated custom AssemblyInfo file will achieve this for you. All that’s needed is a way to create or determine the version number that should be used.

If possible, try to have your CI tool (such as TeamCity, Hudson, or CruiseControl.NET) generate this version number. Most will generate a build number that you can then use in your application’s version number. Alternatively, there are ways to configure the build process to pull the build number from your VCS itself to incorporate into your application’s version number. In this way, your CI process can not only compile the application, but version it as well.


Version number as a safety check

In environments with lax or nonexistent release processes, developers often migrate code to testing, and possibly even production, from their machines at their own whim. This strategy is particularly common in brownfield projects.

To counter this issue, have your build process default the application version number to 0.0.0.0. That way, when a developer builds the application locally, it will be tagged with this version number. But if the build server compiles it, it will overwrite the default and apply the normal version number.

This method of versioning an assembly doesn’t prevent a developer from accidentally deploying an assembly she compiled locally. But if a locally compiled assembly does somehow make it to a test server, it will be that much easier to identify from the 0.0.0.0 version number.


While we’re on the subject of automating our compilation, let’s take some time to consider the options available to us now.

Compiling Outside Visual Studio

In section 3.5.1 we created a task that compiles the application in the same way Visual Studio does. But now that we’re automating the process, we don’t necessarily need to do it the same way.

Consider the following task, which compiles all the C# files in a given folder (including files in a subfolder) into a single assembly using the csc.exe command-line compiler:

NAnt calls out to csc.exe to compile certain files into an assembly as defined by the ${compiled.assembly} variable. The code tells us which files to compile into the assembly . In this case, it’s all .cs files in the project directory, including subfolders. Finally , we have to tell csc.exe which external assemblies we’re referencing. In this case, we have a reference to an assembly specified by the ${thirdparty.nhibernate} variable.

It may seem strange at first to compile your carefully layered application into a single assembly. Haven’t we been told for years to keep our application logic in separate layers to make it easier to maintain?

Using csc.exe or vbc.exe, as shown earlier, offers you the ability to control the contents of your final compiled assemblies at a much more granular level than using the solution and project files you normally see in Visual Studio. Making this transition allows you to view Visual Studio’s Solution Explorer window as a file organizational tool, one to help you keep the structure organized while you’re developing. But you may want to deploy the application differently, based on your physical layering instead. This approach takes away the pressure of trying to squeeze your logical or physical layering into the solution/project way of thinking. (We’ll come back to layering in chapter 8.)

The lack of solution/project alignment doesn’t mean that we’ve abandoned structure altogether. Instead we’re placing the files into folder structures that make hierarchical sense when we’re working in the editor. We still must align our compilation of these files with the deployment needs of our application and environment. Depending on your situation, this strategy can allow you to compile your application into, say, one assembly or executable per physical deployment location.


Challenge your assumptions: Solution Explorer isn’t a designer for layering your application

When we first began learning to incorporate logical and physical layering in our applications, we were taught to distinguish the separate layers by creating separate projects in our Visual Studio solutions. We ended up with solutions that had projects called MyApp.Data, MyApp.UI, MyApp.Business, and so on. Although this approach works, it also taught us to think that the IDE, and more specifically Solution Explorer, is a tool for designing both the logical and physical layering of our applications.

Instead of suffering this in the name of “We’ve always done it that way,” we suggest that you look at Solution Explorer merely as a tool for organizing files. Create projects when code concerns require you to, but don’t let your logical and physical layering dictate them. Instead, let namespacing and folder structures in one project delineate the layers.

For example, it often makes sense to separate your application into a UI layer, a business layer, and a data layer in Visual Studio. But if this is a Windows Forms application being distributed to the client, why do you need to deploy it as three separate assemblies? Why not roll it all into one to make deployment easier? (Yes, yes, we’ve heard the “What if you need to update only a single component?” and “What if you want to reuse the component in other apps?” arguments. Without getting into lengthy counterarguments, we’ll sidestep those questions. Our point is that the option is there and should be considered.)

When it comes time to build your application, use a scripting tool, like NAnt or MS-Build, to compile the raw code files into assemblies as you’d like to deploy them. The results may include many files for each physical layer that you’re deploying to. Also consider the possibility that creating one file per physical layer may be the simplest option available without harming the execution of the application.


If you’re thinking that this drastic difference in structure and compilation could cause significant and painful integration, you could be right. You could be right if you aren’t using the automated build script to continually integrate the code (in this case, integration is the act of compiling) on a frequent and continual basis. This painful integration, above all other things, proves the worth of making an automated build script that can be run easily, frequently, and locally by developers. If that build script can quickly provide positive and negative feedback to the developer, problems resulting from the difference in compilation and structure between the script and the IDE will be addressed often and before they become a problem for the entire team.


Tales from the trenches: Building the hard way

On one project we were on, the build process was manual. It wasn’t an easy build, either. The Build Person (when you have a title for a task, it should be immediately noted as a pain point) had a script that ran close to 20 tasks long. On top of that, about four of those tasks resided in a loop that had to be performed for each and every client the company had.

When the count of clients reached five, the pain of creating a release for any or all clients began to mount. By the time there were 15, it was almost unbearable. The “script” was nothing more than an infrequently updated document that had been printed and hung on the Build Person’s wall. Although it outlined the steps required to create a release package for each client, the intricacies of the process were lost on all other people in the company.

So the Build Person slogged on in his own version of torture. Each execution of the “script” took over an hour and a half of a tedious manual process involving copying files, changing references, incrementing assembly version numbers, and compiling assemblies, among other things. As you can imagine with such a manual process, problems were frequent.

Each client’s release package had to be manually, and independently, verified just to make sure that the assemblies could talk to each other. The result was a significant, and unnecessary, burden on the testing team. In the end, this process was one of the more obvious friction points between the testing and development teams.

Looking back on this project, we should’ve recognized the pain that could’ve been solved by automating such a tedious process. We didn’t know about build tools at the time, but all we needed was a batch script that could be executed from the command line. Although we learned a lot from this project, one of the biggest lessons was that we should never have a manual/error-prone process for something as important as creating release packages for our clients.


Now that we’ve discussed the nuances of and options for getting an automated build script to perform compilation, let’s look at the next step in the build process: the execution of automated tests.

3.5.2. Run automated tests

Once your build process is successfully compiling the application on the build server, the next step is to incorporate tests into the mix. This includes both unit and integration tests.


Note

This section is a precursor to chapter 4. You may want to skim and come back to it after you’ve had a chance to read the next chapter. We’ve included it here because we feel it’s important to have a CI process in place as soon as possible to close the feedback loop, even if the only thing it does at first is compile the application.


For the moment, we’ll assume your solution already contains unit tests and integration tests. If not, chapter 4 goes into more detail on how to set them up. If necessary, you can create a project with a single dummy test for the sake of setting up the CI process.

The following snippet is an example of a simple NAnt target that will execute NUnit (a unit-testing framework) against an assembly that contains automated tests:

The target shown here executes the nunit-console.exe application with appropriate parameters. It depends on another target, test.compile , which will compile the test project into an assembly called MyProject.Test.dll. We shell out to nunit-console.exe to perform the testing.

When it has completed the execution of all the tests that it can find, NUnit will return a success or fail indication. We highly recommend that you make any failure within NUnit reflect as an overall failure of the build script—shown here with the failonerror attribute. If you don’t, your build will still be successful because it’s contingent only on whether NUnit was able to run and not on whether all the tests passed.

Setting up automated testing can include many different types of tests and tools. At the core are pure unit tests that only execute and verify isolated segments of code through unit-testing frameworks like NUnit. It’s also possible to launch integration, acceptance, and UI tests from your build script. By their nature, these tests usually take more time to execute and can be more difficult to set up and run. Tests of these types shouldn’t be relegated to second-class status, but their slower nature does impose a penalty on the velocity of the feedback loop that the developer is involved in. Section 3.7.1 addresses how to deal with long-running tests in your CI process.

Now that we’ve addressed testing in our build scripts, the final step in the build process is to create releases.

3.5.3. Create deployable releases

Now that your CI process is both compiling and testing your application, your developers are able to get into a rhythm, as discussed earlier. But there’s still one step left in the program: you must be able to create a releasable package of the application automatically at any given time.

What constitutes a releasable package depends on your scenario. In determining this definition, start with your current release process and try to automate and verify as much of it as possible.

If the release process for your application involves handing a Microsoft Installer (MSI) file to another team for server or client installation, you should strive to have your automated build create that MSI file. Other times you’ll work on an application that’s deployed using Xcopy capabilities. Your release, in that case, will probably be a file folder that can act as the source for the Xcopy to use.

Note that releases and deployments are two different things. A release is the deliverable that will be used during a deployment. A deployment is the act of installing a release into an environment.

A release is something that you can create every time you check in code. There’s benefit to doing so even if you don’t end up using many of the releases. It takes up a bit of disk space, but this problem can be mitigated with an appropriate retention policy on your releases. In exchange, you can have a deployable package at your fingertips at any given moment.

We’ll talk a little more about the releases folder in the next section.


Automating deployments

While you are in an automatin’ kinda mood, consider automating the deployment process as well as creating releases. Whether this involves deploying an updated one-click installation for a Windows application or deploying a web application to the test environment, there’s value in having a process that can be done with minimal intervention in a repeatable way.

Even after you automate the deployment process, it’s not often a good idea to deploy the application as part of every check-in, at least at first. If your team is in a rhythm of checking in often, many of the check-ins will result in no noticeable change to the testing team and, worse, could potentially create an acceptance testing environment that’s considered buggy.

Instead, create a CI process that runs the deployment process on a separate schedule, say, nightly or weekly. Or create one that’s launched manually whenever your team feels you have a package worth deploying.

In any case, deployment affects more than just your development team. We advise you to discuss deployment with your testing team with regard to the schedule and manner of deployment. If the schedule doesn’t work for them, expect to see an increase in complaints, often in the form of defect reports that say “This feature doesn’t work” when a change was either expected and not deployed or not expected and deployed.


Regardless of what’s to be distributed, you’ll still need to automate the process of building that package. There are a thousand ways to release software. Automating the simple ones may require only that you copy files from one location to another. More complicated scenarios may require you to zip files, stop and restart a web server, recreate a web application, or update a database.


Tales from the trenches: A tale of two release processes

In the past, we’ve worked on projects that have varied levels of maturity in the creation of their release packages and the process of deploying that code to different environments. In one case, we participated in a project where every production deployment was frenetic, bordering on chaotic.

That project had multiple testing environments, some solely dedicated to the development team, available for releases to be tested in. Unfortunately, testing in these environments was limited to verification of the application’s functionality, not testing the deployment process. The result was that release packages, and the accompanying written installation scripts, were first tested the day of production deployment.

When issues surfaced, not all fixes were applied back to the artifacts used to create the release package. What was the result? We saw the same issues on many subsequent installations. Eventually someone would be involved in multiple releases and they’d get frustrated with having seen the same thing over and over, so it would be fixed. But because many people rotated through the responsibility of supporting the infrastructure team, many recurring issues were seen only once per person and quickly forgotten.

To contrast that tale of pain, we also worked on a project that had a mature release and deployment practice. Employing CI and automated build scripts, we created every release in an identical manner. Every time the CI server executed, the release package was created. In fact, we mandated that installation into any server environment had to be done only from a release package created by the CI server’s automated process.

On top of having practices that promoted consistency, the team was engrained with the belief that release packages needed thorough testing well before production installation. We made extensive use of different testing environments to iron out the minutest of issues we encountered.

Two situations occurred that proved the effort and diligence were well deserved. The first was a new-to-the-project manager who required input for scheduling a production release. He stated, “Let’s schedule one week to do the production installation.” The team’s collective response was “Give us 5 minutes.” Too aggressively confident? Perhaps.

Proving that our confidence wasn’t misplaced, we arrived at work on a Monday morning to find out that the least experienced team member had installed the software to production the previous Friday. Although we knew that this day was coming, nothing had been formally confirmed or conveyed to the development team. The installation went so well that this developer said, “It only took a couple minutes.” On top of that we had yet to receive any help desk tickets during the 3 days it had been installed.

Well-practiced release creation and dedication to practicing production installations can pay huge dividends in how your project and team is perceived by management and the clients. In our mind, the week that you release your software to production should be the most anticlimactic week in your project.


Setting up and verifying automated releases and deployments can be time consuming and tedious. But putting in the effort to achieve this level of CI brings a great deal to a team by reducing the amount of time that you spend conducting releases and deployments. Because your CI process has been practicing releases and addressing any issues that surface immediately, the ease and stability of each release to a client will portray your team in a positive light. We all know it’s never bad to look good to the client.

You may have noticed a distinct lack of specifics for brownfield projects in this section. The good part about implementing CI is that it can be done at any time in a project’s life cycle. Whether you’re just starting out or already have a well-loved application in place, adding CI to the mix can be little more than automating what your team does every day—compiling and testing the application.

While the setup of CI can occur at any point in the life cycle of a project, there are some helpful conventions that impose more effort on a brownfield project than one that’s greenfield. In the next section, we’ll examine how applying some of these conventions can introduce change to the folder structure of a brown-field application.

3.6. Build components

When working on an automated build process, you’ll create a number of components that will be used in the process. At a minimum you’ll have a build script file. You may also end up with supporting files that contain scripts for database creation and population, templates for config files for different testing environments, and others. These are files that you want to have in source control. Because these files are related only to the build process, we recommend that you organize them in a separate folder, as shown in figure 3.6.

Figure 3.6. Keep the build components separate from your code and third-party libraries.

In addition, you can create a project or solution folder within your Visual Studio solution to hold these files. See figure 3.7 for an example.

Figure 3.7. Build components can be stored in Visual Studio in solution folders or even separate projects for ease of use. Note that solution folders don’t map to physical folders.

As you can see in our sample folder structure, we have subdirectories to store different types of files. The config folder stores any configuration files that will be included in your release package. This folder also includes any templates that you have that are used for the creation of configuration files.


Environment templates

Templates are a great way to add environment-based configuration values to your project’s automated build process. They are akin to .ini files that contain variables with values specific to a certain environment.

For example, your main build script could make reference to a database connection string stored in a variable. When you initialize the script, you could tell it to load the variables from the appropriate template based on which environment it was building for (development environment, test environment, or production).

Examples of variables that make good candidates for environment templates include database connection strings, logging configurations, and third-party service endpoints.


We also show a separate sql folder, which includes folders for data and DDL (data definition language, the scripts that create or alter your tables). The purpose of the sql folder is to have a repository for any SQL scripting artifacts that you may have for the creation of your release. The data and DDL scripts are separated primarily for file management reasons. The DDL folder would hold all scripts needed to create a database, its tables, their keys and relationships, and any security entries needed. The data folder contains all the scripts for priming the database with any initial data the application needs, such as lookup values and application configuration settings.

Keep in mind that this structure is merely a recommendation. Our goal isn’t to impose a rigid folder structure for you to copy in your own brownfield application. Rather, we hope to get you thinking about your application in a different way, as more than just code. Our recommendation is based partially on simple organization of the files, concepts, and tools. It’s also based on the need to make the build script as simple as possible.

Your build script should create two additional folders in your local source tree. When compiling the code into assemblies and executables, you’ll need to put those compiled files, or build artifacts, somewhere. Having your script create a compile folder is a good way to keep the results of the build process separate from the raw code and build components. After you add automated testing, the compile folder can also be used as the working directory for the testing.

The second folder that you may want to have your build process create is a release folder. Here you can store the deployable releases we mentioned back in section 3.5.3. Figure 3.8 shows a sample structure.

Figure 3.8. Example of creating a release archive in your folder structure


Note

If you automate your releases into a folder such as the one in figure 3.8, it’ll look like this only on the build server. There’s no need for developers to store the various releases of the application on their local machines. In fact, if you have the build script automate the versioning of your application (as discussed in section 3.5.1), the only folders they should see are Latest and v0.0.0.0.


Neither the compile nor the release folder should be added to your VCS. The contents of the compile folder can be re-created at will so there’s no need to include it. As for the release folder, you should archive its contents, but that’s best handled through a separate nightly backup process, not your VCS.

You should also consider a retention policy as well because the build process could potentially create dozens of archives per day. As a general rule, keep the release that was most recently deployed to the acceptance testing environments (and possibly the one that was deployed prior to that as well).

From the start of creating your automated build scripts, through the files that will be required for compilation and on to the archiving of the final releasable assemblies, there are a number of things that will be considered build components. Maintaining a flexible and reliable folder structure sets the foundation for allowing you to achieve a clean, efficient, and maintainable build process all while increasing the speed of the feedback loop.

Continuing to narrow our focus, we’ll now turn our attention to the details of your CI process.

3.7. Tweaking the CI process

Now that you have a basic CI process in place that builds, tests, and creates releases, you can get back to developing and ignore the build process, right? Well, if you’ve read any technical books at all, you should know by now that the answer to any question that ends in “right?” is always no.

You should tweak the build process from time to time. Perhaps you have added a new tool that needs to be compiled into the test assembly in order to run the tests. Or maybe you’ve changed the structure of your application and the compilation order needs to be modified. Or you’ve added a web service and you need to incorporate it into the releasable package.

Whatever the case, you’ll find that your build process must be revisited whenever you do something out of the ordinary.

Now’s the time for us to examine some common situations that will require you to tweak your build process.

3.7.1. Handling long-running tests

The final step in our modified check-in dance is to stop what you’re doing until you receive confirmation that the build completed successfully. But what if your tests take 10, 15, even 30 minutes to run? Perhaps you have a suite of tests that primes and cleans up a testing database before and after each test. As you can imagine, this action occurring in hundreds or thousands of tests can extend the length of the test run into minutes or even hours.

In these cases, it isn’t practical to expect developers to wait around until the tests have completed. We need to relax the restrictions somewhat and consider how we can modify the build so that developers can get back to work quickly after a check-in but still be reasonably confident that the application is being properly tested.

You should consider this problem for two different testing locations: locally and on the build server. As a developer, you should be able to run the automated build script locally without having to wait more than a few minutes. If the local process takes any longer, many developers will have a natural tendency to avoid running it. This reluctance could lead to fewer check-ins and, thus, lower confidence in the code.

One way to overcome the friction caused by long-running tests is to separate the build process into two parts. The first part compiles the application, executes the unit tests, and performs a release of the software. This process should always execute quickly.

The second part of the process is similar to the first, except that it would also execute the slow-running tests.

These two test execution paths (the fast and the slow) should be exposed from the build script separately so that they can be accessed and executed by the local developer as well as the CI server. The ability to access the different testing executions has to be frictionless. A common solution is to embed the execution of the script into separate command-line batch files, one for each of the paths.

The CI server can be configured so that it executes the fast build every time a developer checks in code. If the fast build completes successfully, it should trigger the slow build. With this implementation, you don’t have to wait until the integration tests have completed. Once the compilation and unit tests have run successfully, you can be reasonably confident enough to begin working on your next task. The confidence stems from the fact that your unit tests have passed and that the integration tests are more environmental. If any of the tests fail, chances are it’s an issue with your infrastructure or network rather than your code. This may not always be the case, but it’s a trade-off you should consider in the name of developer productivity.

Incidentally, while the slow build process is running, other developers shouldn’t have to wait for its completion to perform their own check-in process. If another developer checks in code while the integration tests are still running, the CI process should queue up another test run for the integration tests to execute as soon as the current run completes. Many CI software applications offer this capability natively. If you have, or have the potential for, a large number of slow-running tests, software that can easily be configured to handle build queuing can be vital to ensuring that all types of builds and tests are run as often as possible without impeding the developers.

3.7.2. Architectural changes and additions

From time to time, your projects are required to make major architectural changes or additions. As a result of these changes, your deployments and releases are altered. Often the change of a deployment strategy will affect the compiled structure of the application.

All of these things will likely require you to change the build script. When that happens, you’ll see the benefits of having a well-factored build script. Changes and additions should be easy to apply in isolation and with limited effect on the surrounding build process.

Like the initial creation of the build script, making changes and additions requires verification. Like compiling and releasing code within a continually integrating environment, working in the build script requires local execution, testing, and verification. In this case, testing and verification would be for the compilation and release package for the application.

3.7.3. Labeling the build

Another useful tweak is to label or tag each build. This labeling can, and should, be done automatically by the build process. Most VCSs allow for this, and most automated build tools will plug into this ability as well.

We’ve already discussed one way of labeling the build in section 3.5.1 when we talked about having your build process version your assemblies. Versioning your assemblies can be useful in identifying issues resulting from deploying the wrong version of your application.

Another way to label your application is in the version control repository. You can handle this process in many ways. Some CI tools offer this ability so that the code is automatically labeled whenever it’s retrieved for compilation. If your tool doesn’t offer this capability, build scripting tools can perform this operation.

Many build scripting tools have built-in integration with various VCSs. When this is the case, you can add labels to the VCS with a simple call from your build scripting tool of choice.

It’s possible that your VCS doesn’t have labeling support in either your CI server software or your build scripting language. If so, you can, as a last resort, use your build scripting language to call out to a command shell. At the command shell, you can then execute a console tool for your VCS and use its functionality directly to perform the labeling.

When applying a label to your code repository, be sure to include the version number generated by the CI process. Another good idea is to apply an appropriate label to the repository when the application is deployed. This can be invaluable in pinpointing the origin of bugs, especially ones that were thought to have been fixed.

We haven’t provided an exhaustive list of CI tweaks. Some of them may not even be necessary on every project. Although our primary goal was to give you an idea of some useful adjustments you can make to your build process, we also wanted to get you thinking outside the somewhat prescriptive steps we’ve provided. CI is a good first step in evaluating your brownfield project’s ecosystem because there are so many ways you can tweak it to get some quick wins. While you are working, always be on the lookout for friction that could be resolved by automating it in your build process.

3.8. Summary

Continuous integration is an extremely beneficial practice when implemented on any project, brownfield or otherwise. It helps to build some much-needed confidence within the development team, allowing the team to trust the state of the code residing in the version control system. Developers will be able to look at the status of the CI server and immediately determine if the codebase is in a correctly compiling, tested, and deployed state. That, combined with the use of a check-in dance, should allow developers to never worry that getting the latest version will make their local codebase unusable.

The combination of CI and frequent code check-ins should have a near-immediate effect on your brownfield project. It will create a metronome effect that will get team members into a natural rhythm for the day-to-day work on their project. Often this will have a tremendous effect on the team’s confidence, which may have been shattered by the pain of having to fix others’ mistakes on a regular basis.

In this chapter we also refined the check-in dance to reflect the way that we work in a CI environment. The check-in dance looks like this:

1.  

Check out the file(s) you wish to modify.

2.  

Make your changes.

3.  

Ensure the application compiles and runs as expected.

4.  

Get the latest version of the application.

5.  

Ensure again that the application compiles and runs as expected.

6.  

Check in your changes.

7.  

Verify that build server executes without error.

Implementing the core of CI can be a fairly simple task. The process of creating the automated build scripts that are run on the integration server, as well as locally by individual developers, can take more time. Take these tasks on incrementally. The work-to-benefit ratio weighs heavily in favor of the benefit side of the equation. Go forth now, and build...continuously.

In the next chapter, we’ll build (no pun intended) on our ecosystem by adding automated testing into the mix to increase our confidence even further.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.25.74