In our last chapter, we created and ran our first Bazel project, focusing mostly on the bare minimum to get something up and running. During the course of that chapter, we employed two specially named files: BUILD and, to a much lesser extent, WORKSPACE. While the use of the BUILD file was apparent, we (intentionally) left the WORKSPACE file alone. In this chapter, we are going to explore a greater set of functionalities for the WORKSPACE file.
Note
This chapter will give a high-level overview of the WORKSPACE file; however, performing the exercises will be crucial to starting to get a feel for how it actually works.
WORKSPACE Files
In the last chapter, we left the WORKSPACE file completely blank. For that particular example, we did not need to add anything else, since we were only making use of all the functionality that comes out of the box from Bazel.
All code was within a single, local repository.
The only rules and build targets required came out of the box from Bazel.
In practice, however, this combination is usually not viable for most projects. You will need to depend upon additional functionality and, in all likelihood, employ other languages or types of build targets for your projects. The WORKSPACE file is the place to set the stage for the body of functionality and rules required by your project.
Adding New Rules to WORKSPACE
As stated earlier, Bazel comes with out-of-the-box support for a number of build Rules. For example, there are rules that define how to build, compile, link, etc. for C++, Java, Python, and so on. Additionally, a vanilla Bazel project also defines utility rules that are used to define how resources (e.g., data files) should be packaged and referenced within your project.
One of the most powerful aspects of Bazel is the ability to add new rules to expand its capabilities. By adding new rules to our project’s workspace, we can add in retrieve remote dependencies, add in new languages, and more.
Notably, there are rules which are packaged with Bazel that are not automatically loaded by default. This enables you as a project creator to have explicit control over what rules you want to have available within your project.
The basic command that we will use to load in new rules is load, which is built into Bazel.
Note
The load command will be used both within WORKSPACE and BUILD files. As you might have guessed, we will end up using this to explicitly pull into new types of functionality into our BUILD file as well.
This will cause load to pull in the file found within the local path and load the specified symbol into the local environment; when placed into the WORKSPACE file, this will load the symbol into the local environment.
In the previous chapter, we had left the WORKSPACE file completely empty and relied solely on the build-in rules. Now let’s start to add a bit of new functionality into the WORKSPACE file.
A simple load command
Save that file to disk.
Congratulations! You’ve just added some functionality into your WORKSPACE file. Of course, that functionality does not actually do anything at the moment (we will get to that in the next section).
Note
A sharp observer will once again notice the introduction of a new file type: .bzl. Although outside the scope of this book, it is sufficient to know that .bzl files are used to define rules for Bazel (e.g., build rules) and give us the ability to expand Bazel’s capabilities (e.g., the addition of new languages).
A Deeper Dive into the Load Path
If you are taking a close look at the load path from the preceding example, you might notice something interesting: that particular path does not exist within your file system. So, where is this coming from?
The very first element of the path is @bazel_tools. The @ signifies to Bazel that you are loading from a particular Bazel repository, called bazel_tools. The file path beneath to the right of bazel_tools specifies a particular path to a file within that repository.
This is an important detail, since this is going to become very important shortly. As your project begins to reference functionality found in other Bazel repositories, you will disambiguate those repositories using a name. This allows you to create absolute paths to the build targets you require for your project.
At this point in time, you still haven’t pulled in any external repositories, so where did bazel_tools actually come from? The bazel_tools repository is special and (sort of) comes “out of the box.”
This is essential since it comes with some important functionality, not the least of which is the ability to pull in other repositories. Consider this a bootstrapping repository you acquire by virtue of installing Bazel and creating a WORKSPACE.
Finding the bazel_tools Repository
Notice that all we have here are the directories and files that we had created previously; our clean command has eradicated all build products, dependencies, outputs, and so on. At this point in time, the Bazel project is effectively untouched; no Bazel commands have actually been executed. Bazel strives to never download more than it needs at a given point in time; as such it won’t even download bazel_tools to a given WORKSPACE unless it absolutely needs to.
Congratulations! You’ve found the repository. Notably, where it is located illustrates a few important points about Bazel.
Note
For the curious, if you continue to explore through the bazel_tools directory, you will find the tools/build_defs/repo directory there. This is where you previously had loaded the http.bzl file from.
First, that your project is individually meant to be the definitive source of truth. There is not a central location across all of your projects where a common bazel_tools repository exists; each project is meant to get its own version of a repository (although this doesn’t prevent Bazel from doing some optimization behind the scenes to share repositories via file linking).
Secondly, that Bazel will not download a dependency unless it is absolutely required to do so. We will revisit this later on in this chapter; however, even if you create new external dependencies in your WORKSPACE file, if you never use anything from said dependencies, Bazel will not download them. This goes to the heart of the notion that by making everything explicit, Bazel can do some cool optimizations.
Loading Multiple Rules at the Same Time
Before we leave this section, it is worthwhile to know that it is possible to have multiple rules within the same file. While you could execute multiple load commands in order to pull in the desired functionality, you can also just retrieve all the necessary symbols at once.
A load command that pulls in multiple symbols
This will load both the http_archive and http_file symbols into your workspace.
Referencing Other Dependencies
In the last chapter, we explicitly downloaded the JUnit libraries and added these directly to our project. This fits really well into the model that Bazel prefers (i.e., a monorepo).
However, Bazel provides the ability to reference other external dependencies in a couple of different ways. This provides some additional flexibility by allowing you to add to your project without ingesting the dependencies explicitly.
http_archive
git_repository
Note that while these rules used to be out of the box for earlier versions of Bazel, you need to load them explicitly to get them into your project.
Each of these rules are designed to retrieve remote Bazel repositories and make their contains targets available as dependencies for your project.
http_archive
http_archive is used to reference and retrieve a compressed Bazel repository, given a path to said compressed file. Once the compressed repository has been retrieved, it is decompressed, and the contained rules, targets, and so on can be used within your project.
Example http_archive
Let’s break down the preceding code a little bit. The preceding rule specifies to retrieve the repository from the location http://my_favorite_url.com/path/to/archive.zip. Assuming this is successful, the archived file will be retrieved, downloaded, and decompressed (if it hasn’t been already), making the content available for use.
Now, earlier we discussed how we needed to use the label bazel_tools in order to use any functionality within that repository. In a similar fashion, in order to make use of any functionality in our new repository foo, we need to use label @foo.
http_archive for Go language rules
Save your WORKSPACE file.
As you can imagine, this will pull down the compressed repository for the Go language rules and decompress it, making the repository’s targets available for use. As we explored earlier, this repository will ultimately end up in the chapter_04/bazel-chapter_04/external directory (notably, it won’t be there right away, for reasons we discussed earlier in the chapter).
Retrieving functionality for Go
Save this to your WORKSPACE file.
You should notice that you needed to specify @io_bazel_rules_go to form the correct path to get access to the underlying functionality.
git_repository
While http_archive is focused on retrieving a compressed archive of a Bazel repository (whether it is part of an SCM system or not), git_repository is used to clone a git repository and check it out at a given commit (or tag).
Loading and using the git_repository
Having broken down http_archive, there are some features that look very similar. In this case, name operates identically, acting as a disambiguating label for the repository. Similar to http_archive’s urls parameter, remote specifies the path to the Git repo that we want to clone (e.g., on some place like GitHub.com). The only major difference is the commit, in order to specify the version of the repo to actually retrieve.
Retrieving a Git Repository
Retrieving the repository for Go
Caution
The specific commit hash used here is only current as of the time of this writing; you may need to check the repo for a more current one.
Prior to saving this into your WORKSPACE file, it is highly recommended to comment out the http_archive version of the same request. Otherwise, you will have the same name represented between your http_archive version and your git_repository version. Bazel will disambiguate which one “wins” by taking the last one in the file; however, for the sake of clarity, you shouldn’t add ambiguity to your WORKSPACE file in your dependencies.
Save your WORKSPACE file.
Fine Print on git_repository
Although git_repository clones a remote git repository into your Bazel project, this does not actually confer the ability to work with it as you would with a normal git repository. That is, you cannot go into the directory that contains the Git repo (e.g., bazel-chapter_04/external/<name of git repository>) and start performing a typical set of git operations (e.g., commit, push, etc.). And, given where the repository is placed, this should make sense: all of the bazel-* directories are ephemeral. All of them can be removed by a simple act of bazel clean, which could easily eliminate any locally created changes.
One way to make edits to an external git repository and have it reflect into your project is to clone that repo separately, make and commit your changes, and then update your project’s commit hash to match with the newly created commits. Admittedly, this may not be the smoothest workflow; however, remember that Bazel constantly is focused on reproducibility. Explicitly tracking the dependencies is one of the keys that gives Bazel its power.
Using the tag instead of the commit hash
Now instead of being locked onto a specific commit hash, you track to a particular tag; if you make updates to the Git repo (and subsequently update the tag), your project will get the version corresponding to the tag.
As an alternative to both tag and commit, you can also use “branch” to refer to a specific branch of a Git repo.
Note
You must choose among “tag,” “commit,” or “branch” to refer to a particular version of the code; you cannot use more than one at the same time.
While this makes working with external repos more convenient, this provides a much weaker guarantee than the commit hash. While this can be convenient for doing development, it can also lead to issues in practice, since you are dependent upon what amounts to a floating version of code.
http_archive vs. git_repository
Both http_archive and git_repository are tools for referencing external Bazel repositories; however, this raises the question “Given the option between the two, which should I use?” For example, GitHub provides both git repositories (obviously) and archives.
As a default, the recommendation from Bazel is to prefer http_archive. This makes sense, since it provides the strongest guarantee of reproducibility (i.e., the archive is static for a given version). It is also faster to download and extract an archive than to clone a reposition. Additionally, it obviates the need to install git to build a project. This is especially a good idea for dependencies whose versions are expected to change slowly.
Note
Strictly speaking, the contents of even an http_archive URL may change. In order to strengthen the guarantee for retrieving the correct files for the sake of reproducibility, there is another attribute, sha256, which contains the expected SHA-256 of the archive to retrieve. Although this field is omitted for simplicity here, for real development, you should set this field in order to ensure the hermeticity of the build.
On the other hand, if an archive is unavailable or you need to work with external dependencies that are changing rapidly, then git_repository may make a lot more sense (especially given the aforementioned ability to work with git tags).
As a last word, Bazel projects favor being monolithic, so one avenue to consider is to avoid using external dependencies and pull in the necessary code into your project. This is not always possible or convenient but does provide the strongest guarantees for your builds.
Employing a New Language
While Bazel comes with several languages out of the box, we need to be able to add more languages as needed to our project. In case you have not guessed it, we will add support for Go into our project.
Loading the Go language rules
Save the WORKSPACE file.
You’ve seen most of that example previously, with the exception of the last two lines. The last two lines invoke the loaded rules to set up the Go language with your project.
Note
Unlike many constructs and patterns in Bazel, the above two lines should not be considered a canonical example for all languages. Each new language has its own set of one or more rules for setup, so the function names will likely be slightly different each time.
Hello World in Go
Save this to hello_world.go.
Now, let’s crack open the BUILD file in the src directory so we can create the Go target. However, prior to actually creating the target, we need to explicitly load up the Go rules.
Loading the Go language rules
Create the go_binary target
Congratulations! You now have added a new language and created a target for it!
Locating the Go Language Rules Repository
Previously, Bazel only pulls down the dependencies required to build your requested target. If you performed a clean action, followed by a build of a non-Go target (e.g., the earlier Java targets), you would find that io_bazel_rules_go would not exist within your bazel-chapter_04/external directory, despite the fact that both targets exist in the same BUILD file.
Exercise – Add Yet Another Language
Throughout the section on the WORKSPACE file, we’ve been building up the knowledge for adding a new language into your project. Now that you have the tools for this, you should continue to explore adding new languages to your project.
Go to https://github.com/bazelbuild and look through the various rules packages they have available. Notably, while there will be many common languages there, some rule sets might be outside of the bazelbuild organization. If you don’t find a language to your liking, you can find an even larger list at Awesome Bazel (https://awesomebazel.com). Select your favorite language, set up your WORKSPACE file, and create a build target for that language.