CHAPTER 10
Securing Your Git Repositories

The familiar process of writing code, pushing it to a code repository, and waiting patiently for a peer review became increasingly popular at the start of the adoption of DevOps methodologies. Raising a pull request (PR) for the code changes you want to be merged before code is pulled into the master branch and set live in your applications meant that more than one set of eyes approved those changes before they could potentially cause issues with the applications.

Of course, the code repositories in which you store your application code are another attack surface that are often overlooked on modern Cloud Native estates, and one that can offer hackers the keys to your kingdom.

When using code repositories with AWS, for example, developers often accidentally push access keys and secret keys to the likes of GitHub, BitBucket, or GitLab (services based on the git repository search query software written by the venerable Linus Torvalds, who brought us Linux). Certificates and plain-text passwords are also common residents of online repositories. If you do not think that the storing of precious secrets is a massive problem for today's developers, visit this link: github.com/search?q=PRIVATE+KEY&type=Code.

The search engine on GitHub allows any registered user to hunt for the string PRIVATE KEY via their web interface. At the time of writing, there are a staggering 85,955,849 search results returned from that string. Of course, the vast majority will be correctly programmed code with references to where a key might be stored, but what are the bets that, with some patience, there are a few results that could really benefit an attacker?

There are a few ways of monitoring the storage of such secrets in code repositories, including tokens, certificates, and access keys. In this chapter, we will explore two Open Source tools that can help you automate their discovery.

Things to Consider

There are a few prerequisites to consider when setting up your code repositories for security. The first is that you need to make sure the hosting of your code repository is secure and suitable for your needs. Even some exceptionally large enterprises trust their intellectual property to GitHub's infrastructure, the most popular online git repository service. Microsoft bought GitHub at the end of 2018 in an attempt to increase its reach further into the Open Source community, which now puts some organizations off. For obvious reasons, the majority of larger organizations simply cannot risk an information leak of any variety and instead opt to host their code repositories either within Atlassian’s BitBucket (formerly known as Stash) and GitLab on-premises, locked away within their own cloud infrastructure.

To follow best practices for the basic security principle of privilege separation, it is important for your repositories to offer users only a minimal set of permissions. GitHub and the likes have matured significantly over the years and now provide granular permissions to achieve such levels of separation. Try the following for a hands-on demonstration of generating permission-based access tokens within GitHub:

  1. Sign into the web interface and click Settings, under the top-right menu for your user, and then choose Developer Settings at the bottom of the navigation menu on the left.
  2. Then click Personal Access Tokens, and you are presented with the ability to generate an access token. GitHub says this about these tokens, which are effectively ordinary OAuth tokens: “They can be used instead of a password for Git over HTTPS or can be used to authenticate to the API over Basic Authentication.” In other words, you should treat your access token in the same way as your GitHub password.

In Figure 10.1 we can see an abbreviated list of permissions that you might want to fine-tune on a per-user basis.

Aside from the permissions shown in Figure 10.1, privileged users, who can merge code from user-generated pull requests so that changes will be added to the master branch, should obviously be limited in number. In addition, you will also need a way of quickly revoking users that become persona non grata for one reason or another.

Snapshot of fine-grained permissions from GitHub via personal access tokens.

Figure 10.1: Fine-grained permissions from GitHub via personal access tokens

Source: github.com/settings/tokens/new

You might have many private repositories and also a handful of public ones, especially if you are Open Sourcing some of your software. As a result, it is essential to enforce public/private classifications carefully to avoid potentially devastating mistakes.

And, to prevent data leaks, it is imperative that you avoid giving too much information away in public repositories. An attacker who learns a developer's real name, username, company, department, and email address might feel empowered to conduct targeted phishing attacks.

If you are concerned about information that needs to be secret and how to integrate it securely in your code repositories, there is an interesting page on GitHub (docs.github.com/en/actions/configuring-and-managing-workflows/creating-and-storing-encrypted-secrets) that explains how to store and encrypt secrets.

Let's look at installing and configuring tools that can automatically assist with some of the issues that we have looked at. There are a number of popular options for enumerating and sifting through GitHub repositories. Such tools include the excellent GitRob (github.com/michenriksen/gitrob), which we will try in the second half of the chapter. It is fully featured, requires minimal installation effort, and is popular thanks to the fact that rules can be customized with relative ease.

First, however, we will look at another clever piece of Open Source software called Gitleaks (github.com/zricethezav/gitleaks) to get us started. According to the Gitleaks GitHub page, it is “a SAST tool for detecting hardcoded secrets like passwords, API keys, and tokens in git repos.”

Installing and Running Gitleaks

In true Cloud Native form, we will opt for the Docker method to run Gitleaks by using the provided container image. Clearly, it is really important that you explicitly trust container images, and it is recommended that you get into the habit of building an image from a Dockerfile directly to make sure that you know exactly what is contained within it. The Dockerfile for Gitleaks can be found at github.com/zricethezav/gitleaks/blob/master/Dockerfile, and it contains a base layer written using the Go language.

To get started, we will pull the image down with Docker (you can use any runtime that is compatible with OCI images you want):

$ docker pull zricethezav/gitleaks
Using default tag: latest
latest: Pulling from zricethezav/gitleaks
aad63a933944: Pull complete
02ab6908a836: Pull complete
1b175b4469a1: Pull complete
Digest: sha256:8207101097bf84f3ed[. . .snip. . .]98012f5e447bc16256a7ca3ac7
Status: Downloaded newer image for zricethezav/gitleaks:latest
docker.io/zricethezav/gitleaks:latest

There are also installation options for brew on Macs and the Go language, shown respectively within the documentation as the following commands:

$ brew install gitleaks
$ GO111MODULE=on go get github.com/zricethezav/gitleaks/v4

There is a bundled configuration file full of useful default settings that can be found at (github.com/zricethezav/gitleaks/blob/master/config/default.go and then finely tuned afterward as you see fit. Let's look at an example now. From the top, the second rule down within that file is looking for regular expressions (regex) relating to secret keys from AWS.


[[rules]]
        description = "AWS Secret Key"
        regex = '''(?i)aws(.{0,20})?(?-i)['"][0–9a-zA-Z/+]{40}['"]'''
        tags = ["key", "AWS"]

As you can see, the rule uses a regular expression (regex) that is used to catch the format of a typical AWS secret key. This is how Gitleaks spots information leakage within a repository, and you will find this is the common approach of such tools—just as we will see with GitRob in a moment. Using regex means that the rulesets are extensible and relatively easy to alter and fine-tune to suit your needs (although admittedly sometimes regex can have an arcane syntax). Let's take a look at Gitleaks in action next.

To show off its abilities, we will point Gitleaks at a file created in GitHub with some bogus AWS credentials called secret_key.txt. It can be found at github.com/chrisbinnie/CloudNativeSecurity/blob/master/secret_key.txt.

The file is just a replica of what you would expect to see in a ~/.aws/credentials file, which stores your AWS credentials locally if you do not use the preferred environment variables approach, and its contents are shown here:

[default]
aws_access_key_id = AKIAYEKNPWXOCW4YDEWX
aws_secret_access_key = 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

Let's try a simple scan using this command to run Gitleaks along with the required options:

$ docker run zricethezav/gitleaks  
--repourl=https://github.com/chrisbinnie/CloudNativeSecurity --redact

time="2020–07–20T20:12:04Z" level=info msg="cloning..
https://github.com/chrisbinnie/CloudNativeSecurity"
time="2020–07–20T20:12:04Z" level=info msg="scan time:
2 milliseconds 167 microseconds"
time="2020–07–20T20:12:04Z" level=info
msg="commits scanned: 3"
time="2020–07–20T20:12:04Z" level=warning
msg="leaks found: 2"

As we can see from the output, there have been two leaks detected in that repository.

We can get more output for further clarity on what content was highlighted as an issue by using the -v switch at the end of the previous command for verbosity:

$ docker run zricethezav/gitleaks  
--repourl=https://github.com/chrisbinnie/CloudNativeSecurity -v

Using that switch, the abbreviated output is visible in Listing 10.1, which really helps get to the core of the issues highlighted by the tool.

If you get stuck at any point with Gitleaks, you can use the following command, with the appended --help switch as you would expect:

$ docker run zricethezav/gitleaks --help

Additionally, you can check the version as so:

$ docker run zricethezav/gitleaks --version
v4.3.1

To prevent your logs from being populated with potentially sensitive credentials simply add --redact to the end of the command:

docker run zricethezav/gitleaks  
--repourl=https://github.com/chrisbinnie/CloudNativeSecurity --redact
 
Enumerating objects: 9, done.
Counting objects: 100% (9/9), done.
Compressing objects: 100% (7/7), done.
Total 9 (delta 0), reused 9 (delta 0), pack-reused 0
{
     "line": "aws_access_key_id = REDACTED",
     "lineNumber": 2,
     "offender": "REDACTED",
     "commit": "3c7c6639b9b7c34d5c192ef017e3047d2ac671d0",
     "repo": "CloudNativeSecurity",
     "repoURL":
       "https://github.com/chrisbinnie/CloudNativeSecurity",
     "leakURL": "https://github.com/chrisbinnie/

       CloudNativeSecurity/blob/[. . .snip. . .]/secret_key.txt#L2",
     "rule": "AWS Access Key",
     "commitMessage": "Add bad creds
",
     "author": "Chris Binnie",
     "email": "[email protected]",
     "file": "secret_key.txt",
     "date": "2020–07–19T14:33:57+01:00",
     "tags": "key, AWS"
}

Another useful option is -pretty. If any leaks are found, this will output nicely formatted JSON, which is probably easier to parse than the other output with a CI/CD pipeline test. In Listing 10.2, you can see an abbreviated output for the leaks found.

Within a pipeline you could add a tool like Gitleaks to your unit tests, to your integration tests, or as a pre-commit hook. Equally, even in tandem, you could run a tool such as this on a schedule of some description, sweeping all of your repositories overnight at quiet times, for example, and then reporting back. Note that AWS actually performs tests like these itself on GitHub (or at least did in the past) to ensure that user accounts are not compromised by a moment of cut-and-paste madness. Should AWS get in touch about such an issue, you will see a warning at the top of the AWS Console under Alerts/Notifications, and potentially the root user of the AWS account or AWS organization might also receive an email or a similar notification.

We will look at another similar tool in a moment, GitRob, but let's quickly look at some other alert types from Gitleaks. The Gitleaks GitHub repository itself holds some treasure to plunder for testing purposes, so we will run the tool against that next. Using the first command shown earlier, it is possible to find 811 leaks detected and 584 commits audited ; in Listing 10.3 you can see an abbreviated output in verbose mode.

There are lots of other options that Gitleaks can support, and you are encouraged to fine-tune the output to your needs.

Installing and Running GitRob

Now let's take a look at a similar tool, called GitRob, which can be found in GitHub as you'd expect (github.com/michenriksen/gitrob). GitRob is possibly the first Open Source tool that springs to mind when people think about scanning code repositories. As we will see, there are some alternative features available, relative to Gitleaks.

To get started, you are offered the choice of compiling the tool from source code using the Go language or using a precompiled binary (github.com/michenriksen/gitrob/releases/tag/v2.0.0-beta). Assuming you trust the provenance of the binary (and that it is not a security threat), you can download Linux, Mac, and Windows versions to suit your needs. One way to test the provenance is by using a checksum. Let's download the Linux Zip file and validate its checksum using these commands where the long URLs are executed on one line each:

$ wget
https://github.com/michenriksen/gitrob/releases/download/v2.0.0-beta/gitrob_linux_amd64_2.0.0-beta.zip
$ wget
https://github.com/michenriksen/gitrob/releases/download/v2.0.0-
beta/checksums.txt

Next, view the checksums of the downloaded ZIP file:

$ cat checksums.txt
1ec57a99c9a4c7fde9041077f6007330873b6710ccc45ce77814410b5289ad7c
gitrob_linux_amd64_2.0.0-beta.zip
f8e429a1a7f36877b9691a473d4b0a053eac2ce26d12cdfc1b7c57a3504bbb7c
gitrob_macos_amd64_2.0.0-beta.zip
51608bcdb7dfd2446379678b792c266a9fdb649d0296db2b130f12c587d962e7
gitrob_windows_amd64_2.0.0-beta.zip

You can then verify that the result of this next command matches the previous output for the same filename:

$ sha256sum gitrob_linux_amd64_2.0.0-beta.zip
1ec57a99c9a4c7fde9041077f6007330873b6710ccc45ce77814410b5289ad7c
gitrob_linux_amd64_2.0.0-beta.zip

Excellent. As we can see from the top line in the checksums.txt file, our SHA256sum command's output matches, which means our file has not been tampered with en route during its download. This offers a little comfort, but of course if the code repository had been compromised, a precompiled binary might contain unknown threats.

Next, we will unzip the ZIP file:

$ unzip gitrob_linux_amd64_2.0.0-beta.zip
Archive:  gitrob_linux_amd64_2.0.0-beta.zip
  inflating: gitrob
  inflating: README.md

The README.md file is the same as the one found in the GitHub repository for GitRob. You can find more useful information by executing the gitrob binary to check the contents of its --help output. In Listing 10.4 we can see some of the tool's options, which are not too dissimilar to Gitleaks’ options.

Let's see what GitRob makes of our secret_key.txt file, hosted in the GitHub repository. A little earlier in this chapter, we walked through creating an access token in GitHub, instead of using a password. We will need to add an environment variable with an access token to scan the repository as follows:

$ export GITROB_ACCESS_TOKEN=9e3b27d7c382XXXXXXXXXXXXX96b1a45ea351c24

Next, we will run GitRob against the entire GitHub user account for chrisbinnie (to scan all of its repositories in one go) with this command:

$ ./gitrob chrisbinnie

In Figure 10.2 you can see what the aesthetically pleasing output looks like.

Snapshot of GitRob initializing and beginning to scan all repositories belonging to chrisbinnie in GitHub.

Figure 10.2: GitRob initializing and beginning to scan all repositories belonging to chrisbinnie in GitHub

It is worrying that this tool, using the default settings, failed to spot the secret_key.txt file secret as found by Gitleaks. However, it instead found another purposely placed file in a separate repository:

Analyzing 7 repositories..
 INSERT: AWS CLI credentials file
  Path….: .aws/credentials

It is important to note that Gitleaks missed this second file using its default configuration.

This would suggest that GitRob has cloned the repositories but not pattern-matched the contents of the secret_key.txt file with its default regex. It is possible that the tool may have only captured the actual path and filename aws/credentials using regular expressions. An investigation into its bundled rules configuration would reveal exactly what was caught, what was missed, and why. Those craving some eye strain might start here: github.com/michenriksen/gitrob/blob/master/core/signatures.go.

An important note from an operational perspective is that as a standard, and apparently for security reasons, when GitRob is run against a repository, the contents of the tool's output are only stored in memory for that session. This means that when GitRob is stopped, the results are lost completely. You can use the -save option to create a report of your findings to analyze later if you are sure that you want to save a scan's results.

Summary

In this chapter, we discussed the likelihood of developers accidentally pushing sensitive tokens, certificates, access keys, and passwords into code repositories and why monitoring your repositories is critically important to avoid unwelcome compromises.

We also looked at two popular Open Source tools, which appear, using default settings, to have differing scanning abilities. Along the same lines as the CVE scanning, which we looked at in Chapter 6, “Container Images CVEs,” from the simple examples shown, it is important to remember the fact that scanning code repositories with one tool will not magically make all of your data leak worries go away.

It would be prudent to spend some time ensuring that the testing of your tool of choice is working exactly as you intend, matching all of the known regular expression patterns that you deal with within your codebase and potentially taking a belt and braces approach to guarding your highly valuable secrets with more than one tool. It should be clear that how you test for secrets is as equally important as just testing for the sake of it.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.188.119.81