This chapter covers
Tom and Joe work as software developers for Acme Enterprises, a startup company that offers a free online service for finding the best deals in your area. The company recently received investor funding and is now frantically working toward its first official launch. Tom and Joe are in a time crunch. By the end of next month, they’ll need to present a first version of the product to their investors. Both developers are driven individuals, and they pump out features daily. So far, development of the software has stayed within the time and budget constraints, which makes them happy campers. The chief technology officer (CTO) pats them on the back; life is good. However, the manual and error-prone build and delivery process slows them down significantly. As a result, the team has to live with sporadic compilation issues, inconsistently built software artifacts, and failed deployments. This is where build tools come in.
This chapter will give you a gentle introduction into why it’s a good idea to automate your project and how build tools can help get the job done. We’ll talk about the benefits that come with sufficient project automation, the types and characteristics of project automation, and the tooling that enables you to implement an automated process.
Two traditional build tools dominate Java-based projects: Ant and Maven. We’ll go over their main features, look at some build code, and talk about their shortcomings. Lastly, we’ll discuss the requirements for a build tool that will fulfill the needs of modern-day project automation.
Going back to Tom and Joe’s predicament, let’s go over why project automation is such a no-brainer. Believe it or not, lots of developers face the following situations. The reasons are varied, but probably sound familiar.
The product launch is a raving success. The following week, the CTO swings by the developers’ desks; he already has new ideas to improve the user experience. A friend has told him about agile development, a time-boxed iterative approach for implementing and releasing software. He proposes that the team introduces two-week release cycles. Tom and Joe look at each other, both horrified at the manual and repetitive work that lies ahead. Together, they plan to automate each step of the implementation and delivery process to reduce the risk of failed builds, late integration, and painful deployments.
This story makes clear how vital project automation is for team success. These days, time to market has become more important than ever. Being able to build and deliver software in a repeatable and consistent way is key. Let’s look at the benefits of automating your project.
Having to manually perform steps to produce and deliver software is time-consuming and error-prone. Frankly, as a developer and system administrator, you have better things to do than to handhold a compilation process or to copy a file from directory A to directory B. We’re all human. Not only can you make mistakes along the way, manual intervention also takes away from the time you desperately need to get your actual work done. Any step in your software development process that can be automated should be automated.
The actual building of your software usually follows predefined and ordered steps. For example, you compile your source code first, then run your tests, and lastly assemble a deliverable. You’ll need to run the same steps over and over again—every day. This should be as easy as pressing a button. The outcome of this process needs to be repeatable for everyone who runs the build.
You’ve seen that being able to run a build from an IDE is very limiting. First of all, you’ll need to have the particular product installed on your machine. Second, the IDE may only be available for a specific operating system. An automated build shouldn’t require a specific runtime environment to work, whether this is an operating system or an IDE. Optimally, the automated tasks should be executable from the command line, which allows you to run the build from any machine you want, whenever you want.
You saw at the beginning of this chapter that a user can request a build to be run. A user can be any stakeholder who wants to trigger the build, like a developer, a QA team member, or a product owner. Our friend Tom, for example, pressed the Compile button in his IDE whenever he wanted the code to be compiled. On-demand automation is only one type of project automation. You can also schedule your build to be executed at predefined times or when a specific event occurs.
The typical use case for on-demand automation is when a user triggers a build on his or her machine, as shown in figure 1.1. It’s common practice that a version control system (VCS) manages the versioning of the build definition and source code files.
In most cases, the user executes a script on the command line that performs tasks in a predefined order—for example, compiling source code, copying a file from directory A to directory B, or assembling a deliverable. Usually, this type of automation is executed multiple times per day.
If you’re practicing agile software development, you’re interested in receiving fast feedback about the health of your project. You’ll want to know if your source code can be compiled without any errors or if there’s a potential software defect indicated by a failed unit or integration test. This type of automation is usually triggered if code was checked into version control, as shown in figure 1.2.
Think of scheduled automation as a time-based job scheduler (in the context of a Unix-based operation system, also known as a cron job). It runs in particular intervals or at concrete times—for example, every morning at 1:00 a.m. or every 15 minutes. As with all cron jobs, scheduled automation generally runs on a dedicated server. Figure 1.3 shows a scheduled build that runs every morning at 5:00 a.m. This kind of automation is particularly useful for generating reports or documentation for your project.
The practice that implements scheduled and triggered builds is commonly referred to as continuous integration (CI). You’ll learn more about CI in chapter 13. After identifying the benefits and types of project automation, it’s time to discuss the tools that allow you to implement this functionality.
Naturally, you may ask yourself why you’d need another tool to implement automation for your project. You could just write the logic as an executable script, such as a shell script. Think back to the goals of project automation we discussed earlier. You want a tool that allows you to create a repeatable, reliable, and portable build without manual intervention. A shell script wouldn’t be easily portable from a UNIX-based system to a Windows-based system, so it doesn’t meet your criteria.
What you need is a programming utility that lets you express your automation needs as executable, ordered tasks. Let’s say you want to compile your source code, copy the generated class files into a directory, and assemble a deliverable that contains the class files. A deliverable could be a ZIP file, for example, that can be distributed to a runtime environment. Figure 1.4 shows the tasks and their execution order for the described scenario.
Each of these tasks represents a unit of work—for example, compilation of source code. The order is important. You can’t create the ZIP archive if the required class files haven’t been compiled. Therefore, the compilation task needs to be executed first.
Internally, tasks and their interdependencies are modeled as a directed acyclic graph (DAG). A DAG is a data structure from computer science and contains the following two elements:
Each node knows about its own execution state. A node—and therefore the task—can only be executed once. For example, if two different tasks depend on the task “source code compilation,” you only want to execute it once. Figure 1.5 shows this scenario as a DAG.
You may have noticed that the nodes are shown in an inverted order from the tasks in figure 1.4. This is because the order is determined by node dependencies. As a developer, you won’t have to deal directly with the DAG representation of your build. This job is done by the build tool. Later in this chapter, you’ll see how some Java-based build tools use these concepts in practice.
It’s important to understand the interactions among the components of a build tool, the actual definition of the build logic, and the data that goes in and out. Let’s discuss each of the elements and their particular responsibilities.
The build file contains the configuration needed for the build, defines external dependencies such as third-party libraries, and contains the instructions to achieve a specific goal in the form of tasks and their interdependencies. Figure 1.6 illustrates a build file that describes four tasks and how they depend on each other.
The tasks we discussed in the scenario earlier—compiling source code, copying files to a directory, and assembling a ZIP file—would be defined in the build file. Oftentimes, a scripting language is used to express the build logic. That’s why a build file is also referred to as a build script.
A task takes an input, works on it by executing a series of steps, and produces an output. Some tasks may not need any input to function correctly, nor is creating an output considered mandatory. Complex task dependency graphs may use the output of a dependent task as input. Figure 1.7 demonstrates the consumption of inputs and the creation of outputs in a task graph.
I already mentioned an example that follows this workflow. We took a bunch of source code files as input, compiled them to classes, and assembled a deliverable as output. The compilation and assembly processes each represent one task. The assembly of the deliverable only makes sense if you compiled the source code first. Therefore, both tasks need to retain their order.
The build file’s step-by-step instructions or rule set must be translated into an internal model the build tool can understand. The build engine processes the build file at runtime, resolves dependencies between tasks, and sets up the entire configuration needed to command the execution, as shown in figure 1.8.
Once the internal model is built, the engine will execute the series of tasks in the correct order. Some build tools allow you to access this model via an API to query for this information at runtime.
The dependency manager is used to process declarative dependency definitions for your build file, resolve them from an artifact repository (for example, the local file system, an FTP, or an HTTP server), and make them available to your project. A dependency is generally an external, reusable library in the form of a JAR file (for example, Log4J for logging support). The repository acts as storage for dependencies, and organizes and describes them by identifiers, such as name and version. A typical repository can be an HTTP server or the local file system. Figure 1.9 illustrates how the dependency manager fits into the architecture of a build tool.
Many libraries depend on other libraries, called transitive dependencies. The dependency manager can use metadata stored in the repository to automatically resolve transitive dependencies as well. A build tool is not required to provide a dependency management component.
In this section, we look at two popular, Java-based build tools: Ant and Maven. We’ll discuss their characteristics, see a sample script in action, and outline the shortcomings of each tool. Let’s start with the tool that’s been around the longest—Ant.
Apache Ant (Another Neat Tool) is an open source build tool written in Java. Its main purpose is to provide automation for typical tasks needed in Java projects, such as compiling source files to classes, running unit tests, packaging JAR files, and creating Javadoc documentation. Additionally, it provides a wide range of predefined tasks for file system and archiving operations. If any of these tasks don’t fulfill your requirements, you can extend the build with new tasks written in Java.
While Ant’s core is written in Java, your build file is expressed through XML, which makes it portable across different runtime environments. Ant does not provide a dependency manager, so you’ll need to manage external dependencies yourself. However, Ant integrates well with another Apache project called Ivy, a full-fledged, standalone dependency manager. Integrating Ant with Ivy requires additional effort and has to be done manually for each individual project. Let’s look at a sample build script.
To understand any Ant build script, you need to start with some quick nomenclature. A build script consists of three basic elements: the project, multiple targets, and the used tasks. Figure 1.10 illustrates the relationship between each of the elements.
In Ant, a task is a piece of executable code—for example, for creating a new directory or moving a file. Within your build script, use a task by its predefined XML tag name. The task’s behavior can be configured by its exposed attributes. The following code snippet shows the usage of the javac Ant task for compiling Java source code within your build script:
While Ant ships with a wide range of predefined tasks, you can extend your build script’s capabilities by writing your own task in Java.
A target is a set of tasks you want to be executed. Think of it as a logical grouping. When running Ant on the command line, provide the name of the target(s) you want to execute. By declaring dependencies between targets, a whole chain of commands can be created. The following code snippet shows two dependent targets:
Mandatory to all Ant projects is the overarching container, the project. It’s the top-level element in an Ant script and contains one or more targets. You can only define one project per build script. The following code snippet shows the project in relation to the targets:
Say you want to write a script that compiles your Java source code in the directory src using the Java compiler and put it into the output directory build. Your Java source code has a dependency on a class from the external library Apache Commons Lang. You tell the compiler about it by referencing the library’s JAR file in the classpath. After compiling the code, you want to assemble a JAR file. Each unit of work, source code compilation, and JAR assembly will be grouped in an individual target. You’ll also add two more targets for initializing and cleaning up the required output directories. The structure of the Ant build script you’ll create is shown in figure 1.11.
Ant doesn’t impose any restrictions on how to define your build’s structure. This makes it easy to adapt to existing project layouts. For example, the source and output directories in the sample script have been chosen arbitrarily. It would be very easy to change them by setting a different value to their corresponding properties. The same is true for target definition; you have full flexibility to choose which logic needs to be executed per target and the order of execution.
Despite all this flexibility, you should be aware of some shortcomings:
Using Ant across many projects within an enterprise has a big impact on maintainability. With flexibility comes a lot of duplicated code snippets that are copied from one project to another. The Maven team realized the need for a standardized project layout and unified build lifecycle. Maven picks up on the idea of convention over configuration, meaning that it provides sensible default values for your project configuration and its behavior. The project automatically knows what directories to search for source code and what tasks to perform when running the build. You can set up a full project with a few lines of XML as long as your project adheres to the default values. As an extra, Maven also has the ability to generate HTML project documentation that includes the Javadocs for your application.
Maven’s core functionality can be extended by custom logic developed as plugins. The community is very active, and you can find a plugin for almost every aspect of build support, from integration with other development tools to reporting. If a plugin doesn’t exist for your specific needs, you can write your own extension.
By introducing a default project layout, Maven ensures that every developer with the knowledge of one Maven project will immediately know where to expect specific file types. For example, Java application source code sits in the directory src/main/java. All default directories are configurable. Figure 1.12 illustrates the default layout for Maven projects.
Maven is based on the concept of a build lifecycle. Every project knows exactly which steps to perform to build, package, and distribute an application, including the following functionality:
Every step in this build lifecycle is called a phase. Phases are executed sequentially. The phase you want to execute is defined when running the build on the command line. If you call the phase for packaging the application, Maven will automatically determine that the dependent phases like source code compilation and running tests need to be executed beforehand. Figure 1.13 shows the predefined phases of a Maven build and their order of execution.
In Maven projects, dependencies to external libraries are declared within the build script. For example, if your project requires the popular Java library Hibernate, you simply define its unique artifact coordinates, such as organization, name, and version, in the dependencies configuration block. The following code snippet shows how to declare a dependency on version 4.1.7. Final of the Hibernate core library:
At runtime, the declared libraries and their transitive dependencies are downloaded by Maven’s dependency manager, stored in the local cache for later reuse, and made available to your build (for example, for compiling source code). Maven preconfigures the use of the repository, Maven Central, to download dependencies. Subsequent builds will reuse an existing artifact from the local cache and therefore won’t contact Maven Central. Maven Central is the most popular binary artifact repository in the Java community. Figure 1.14 demonstrates Maven’s artifact retrieval process.
Dependency management in Maven isn’t limited to external libraries. You can also declare a dependency on other Maven projects. This need arises if you decompose software into modules, which are smaller components based on associated functionality. Figure 1.15 shows an example of a traditional three-layer modularized architecture. In this example, the presentation layer contains code for rendering data in a webpage, the business layer models real-life business objects, and the integration layer retrieves data from a database.
The following listing shows a sample Maven build script named pom.xml that will implement the same functionality as the Ant build. Keep in mind that you stick to the default conventions here, so Maven will look for the source code in the directory src/main/java instead of src.
As with Ant, be aware of some of Maven’s shortcomings:
In the last section, we examined the features, advantages, and shortcomings of the established build tools Ant and Maven. It became clear that you often have to compromise on the supported functionality by choosing one or the other. Either you choose full flexibility and extensibility but get weak project standardization, tons of boilerplate code, and no support for dependency management by picking Ant; or you go with Maven, which offers a convention over configuration approach and a seamlessly integrated dependency manager, but an overly restrictive mindset and cumbersome plugin system.
Wouldn’t it be great if a build tool could cover a middle ground? Here are some features that an evolved build tool should provide:
This book will introduce you to a tool that does provide all of these great features: Gradle. Together, we’ll cover a lot of ground on how to use it and exploit all the advantages it provides.
Life for developers and QA personnel without project automation is repetitive, tedious, and error-prone. Every step along the software delivery process—from source code compilation to packaging the software to releasing the deliverable to test and production environments—has to be done manually. Project automation helps remove the burden of manual intervention, makes your team more efficient, and leads the way to a push-button, fail-safe software release process.
In this chapter, we identified the different types of project automation—on-demand, scheduled, and triggered build initiation—and covered their specific use cases. You learned that the different types of project automation are not exclusive. In fact, they complement each other.
A build tool is one of the enablers for project automation. It allows you to declare the ordered set of rules that you want to execute when initiating a build. We discussed the moving parts of a build tool by analyzing its anatomy. The build engine (the build tool executable) processes the rule set defined in the build script and translates it into executable tasks. Each task may require input data to get its job done. As a result, a build output is produced. The dependency manager is an optional component of the build tool architecture that lets you declare references to external libraries that your build process needs to function correctly.
We saw the materialized characteristics of build tools in action by taking a deeper look at two popular Java build tool implementations: Ant and Maven. Ant provides a very flexible and versatile way of defining your build logic, but doesn’t provide guidance on a standard project layout or sensible defaults to tasks that repeat over and over in projects. It also doesn’t come with an out-of-the-box dependency manager, which requires you to manage external dependencies yourself. Maven, on the other hand, follows the convention over configuration paradigm by supporting sensible default configuration for your project as well as a standardized build lifecycle. Automated dependency management for external libraries and between Maven projects is a built-in feature. Maven falls short on easy extensibility for custom logic and support for nonconventional project layouts and tasks. You learned that an advanced build tool needs to find a middle ground between flexibility and configurable conventions to support the requirements of modern-day software projects.
In the next chapter, we’ll identify how Gradle fits into the equation.