© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2023
N. TolaramSoftware Development with Gohttps://doi.org/10.1007/978-1-4842-8731-6_8

8. Scorecard

Nanik Tolaram1  
(1)
Sydney, NSW, Australia
 

In this chapter, you will look at an open source security tool called Scorecard. Scorecard provides security metrics for projects you are interested in. The metrics will give you visibility on the security concerns that you need to be aware of regarding the projects.

You will learn how to create GitHub tokens using your GitHub account. The tokens are needed by the tool to extract public GitHub repository information. You will walk through the steps of installing and using the tool. To understand the tool better, you will look at the high-level flow of how the tool works and also at how it uses the GitHub API.

One of the key takeaways of this chapter is how to use the GitHub API and the information that can be extracted from repositories hosted on GitHub. You will learn how to use GraphQL to query repository data from GitHub using an open source library.

Source Code

The source code for this chapter is available from the https://github.com/Apress/Software-Development-Go repository.

What Is Scorecard?

Scorecard is an open source project that analyzes your project’s dependencies and gives ratings about them. The tool performs several checks that can be configured depending on your needs. The checks are associated with software security and are assigned a score of 0 to 10. The tool shows whether dependencies in your project are safe and also provides other security checks such as your GitHub configuration, license checking, and many other useful checks.

The project maintainer runs the tool every day, scanning through thousands of GitHub repositories and scoring them. The score results are publicly available in BigQuery, as shown in Figure 8-1.

A screenshot of the user interface of google cloud depicts various files under 2 projects such as scorecard and query results with various tools including run, save, share, copy, schedule, and more.

Figure 8-1

Scorecard public dataset in BigQuery

To access the public dataset, you need to have a Google (Gmail) account. Open your browser and type in the following address: http://console.cloud.google.com/bigquery. Once the Google Cloud page loads, click Add Data ➤ Pin a Project ➤ Enter project name, as shown in Figure 8-2, for the project name openssf and you will see dataset displayed on the left side of your screen.

A screenshot of the user interface of google cloud depicts various files under a project such as my first project with various tools including pin project, exploring public datasets, data sources, and search.

Figure 8-2

Add the openssf project

In the next section, you will look at setting up the GitHub token key so that you can use it to scan the GitHub repository of your choice.

Setting Up Scorecard

Scorecard requires a GitHub token key to scan the repository. The reason behind this is the rate limit imposed by GitHub for unauthenticated requests. Let’s walk through the following steps to create a token key in GitHub.
  1. 1.

    Go to your GitHub repository (in my case, https://github.com/nanikjava) and click the top right icon, as shown in Figure 8-3, to access the profile page by clicking the Settings menu.

     

A screenshot represents how to access the settings men. The drop-down list under the profile has options for your profile, repositories, code spaces, organizations, and projects.

Figure 8-3

Accessing the Settings menu

  1. 2.

    Once you are on the Profile page, shown in Figure 8-4, click Developer settings.

     

A screenshot lists the menu on the profile page. They are packages, GitHub copilot, pages, saved replies, code security and analysis, applications, developer settings, and security and sponsorship log.

Figure 8-4

Menu on Profile page

  1. 3.

    You will be brought into the apps page, as shown in Figure 8-5. Click the Personal access tokens link.

     

A screenshot with the text reads GitHub apps, OAuth apps, and personal access tokens. The GitHub apps option is highlighted.

Figure 8-5

Apps page

  1. 4.

    Once you are inside the tokens page, shown in Figure 8-6, click Generate new token.

     

A screenshot of personal access tokens with 3 options including generating the new tokens, deleting, and revoking all with an expiry date of Thursday august 2022.

Figure 8-6

Tokens page

  1. 5.

    You will see the new personal token page, shown in Figure 8-7. Fill in the Note textbox with information about what the token is used for and set the expiration to whatever you want. Finally, in the Select scopes section, select the repo tickbox; this will automatically select the reset of the repo permissions that fall under it. Once done, scroll down and click the Generate token button.

     

A screenshot of the new personal access token page includes note names, expiration days, and selection of scope access with personal tokens.

Figure 8-7

Generate a new token page

  1. 6.

    Once the token has been generated, you will see a screen like Figure 8-8 showing the new token. Copy the token and paste it somewhere on your editor so you can use it for the next section.

     

A screenshot of the personal access tokens with three options including generating the new tokens, deleting, and revoking all with an expiry date.

Figure 8-8

Token successfully generated

In the next section, you will use the token you generated to build and run Scorecard.

Running Scorecard

Download the tool from the project GitHub repository. For this chapter, you’ll use v4.4.0; the binary can be downloaded from https://github.com/ossf/scorecard/releases/tag/v4.4.0. Once you download the archive file, unzip it to a directory on your local machine.

Execute Scorecard to check it’s working.
/directory/scorecard help
You will see the following output in your console:
A program that shows security scorecard for an open source software.
Usage:
  ./scorecard --repo=<repo_url> [--checks=check1,...] [--show-details]
or ./scorecard --{npm,pypi,rubgems}=<package_name> [--checks=check1,...] [--show-details] [flags]
  ./scorecard [command]
...
Flags:
      ...
Use "./scorecard [command] --help" for more information about a command.
Now that Scorecard is working on your machine, let's use the token you generated in the previous section to scan a repository. For this example, you will scan the github.com/ossf/scorecard repository. Open terminal and executed the following command:
GITHUB_AUTH_TOKEN=<github_token> /directory_of_scorecard/scorecard --repo=github.com/ossf/scorecard
Replace <github_token> with your GitHub token. The tool will take a bit of time to run because it is scanning and doing checks on the GitHub repository. Once complete, you will see output something like Figure 8-9.

A screenshot of the scorecard output. A table that includes scores out of 10, names, reasons, and documentation or remediation.

Figure 8-9

Scorecard output

You have successfully run the tool to scan a GitHub repository and received an output with a high score of 8.0. A higher score indicates that the repository is doing all the right things as per the predefined checks in the tool.

In the next section, you will further explore the tool to understand how it works and go through code for different parts of the tool.

High-Level Flow

In this section, you will go in depth to understand what the tool is doing and look at code from the different parts of the tool. In digging through the code, you will uncover new things that can be used when designing your own application. First, let’s take a high-level look at the process of the tool, as shown in Figure 8-10.

A flow chart of high-level data includes start, check connections, download scanned information, goroutine, collect results, send results, and sort and print results.

Figure 8-10

High-level flow

Use this diagram as a reference when you read the different parts of the application along with the code. The first thing that the tool does when it starts up is check whether it is able to use the provided token to access GitHub. It is hard-coded to test GitHub connectivity by accessing the github.com/google/oss-fuzz repository (step 2). This is shown in the following code snippet (checker/client.go):
func GetClients(...) (
  ...
) {
  ...
  ossFuzzRepoClient, errOssFuzz := ghrepo.CreateOssFuzzRepoClient(ctx, logger)
  ...
}
func CreateOssFuzzRepoClient(ctx context.Context, logger *log.Logger) (clients.RepoClient, error) {
  ossFuzzRepo, err := MakeGithubRepo("google/oss-fuzz")
  ...
  return ossFuzzRepoClient, nil
}
The code continues after successfully connecting to the GitHub repository by assigning the connection to different GitHub handlers. These handlers use the connection to get different information from the repository (step 3) that will be used to perform security checks. The code for the handler assignment is as follows (clients/githubrepo/client.go):
func (client *Client) InitRepo(inputRepo clients.Repo, commitSHA string) error {
  ...
  // Sanity check.
  repo, _, err := client.repoClient.Repositories.Get(client.ctx, ghRepo.owner, ghRepo.repo)
  if err != nil {
     return sce.WithMessage(sce.ErrRepoUnreachable, err.Error())
  }
  client.repo = repo
  client.repourl = &repoURL{
     owner:         repo.Owner.GetLogin(),
     ...
     commitSHA:     commitSHA,
  }
  client.tarball.init(client.ctx, client.repo, commitSHA)
  // Setup GraphQL.
  client.graphClient.init(client.ctx, client.repourl)
  client.contributors.init(client.ctx, client.repourl)
  ...
  client.webhook.init(client.ctx, client.repourl)
  client.languages.init(client.ctx, client.repourl)
  return nil
}
Figure 8-11 outlines the subset of GitHub handlers that use the different GitHub connections.

A flowchart of GitHub connections has three handlers such as release, branches, workflow connects parallel with GitHub 1 and 4 clients.

Figure 8-11

GitHub handlers using GitHub connections

Once the handlers are initialized successfully with the GitHub connections, the main part of the tool kicks in (step 4). The tool spawns a goroutine that executes the security checks one by one using the information that is downloaded using the GitHub connection. The code that executes the goroutine is as follows (pkg/scorecard.go):
func RunScorecards(ctx context.Context,
  ...
) (ScorecardResult, error) {
  ...
  resultsCh := make(chan checker.CheckResult)
  go runEnabledChecks(ctx, repo, &ret.RawResults, checksToRun, repoClient, ossFuzzRepoClient,
     ciiClient, vulnsClient, resultsCh)
  ...
  return ret, nil
}
Figure 8-12 shows a subset of different security checks that are performed on the GitHub repository.

A flowchart of security checks includes binary artifacts, branch protection, check contributors, and dangerous workflow.

Figure 8-12

Security checks

The runEnabledChecks(...) code snippet is shown next. The function executes each check that has been configured (step 6). On completion, the results are passed back via the resultsCh channel (step 7).
func runEnabledChecks(...
  resultsCh chan checker.CheckResult,
) {
  ...
  wg := sync.WaitGroup{}
  for checkName, checkFn := range checksToRun {
     checkName := checkName
     checkFn := checkFn
     wg.Add(1)
     go func() {
        defer wg.Done()
        runner := checker.NewRunner(
           checkName,
           repo.URI(),
           &request,
        )
        resultsCh <- runner.Run(ctx, checkFn)
     }()
  }
  wg.Wait()
  close(resultsCh)
}
The final step of the tool is collecting, formatting, and scoring the results (step 8). The output depends on the configuration as it can be configured to be displayed on the console (default) or to a file. The code snippet is shown here (scorecard/cmd/root.go):
func rootCmd(o *options.Options) {
  ...
  repoResult, err := pkg.RunScorecards(
     ctx,
     ...
  )
  if err != nil {
     log.Panic(err)
  }
  repoResult.Metadata = append(repoResult.Metadata, o.Metadata...)
  sort.Slice(repoResult.Checks, func(i, j int) bool {
     return repoResult.Checks[i].Name < repoResult.Checks[j].Name
  })
  ...
  resultsErr := pkg.FormatResults(
     o,
     &repoResult,
     checkDocs,
     pol,
  )
  ...
}

One thing that you learn from the tool is the usage of the GitHub API. The tool is used extensively by the GitHub API to perform checks by downloading information about the repository and checking that information using the predefined security checks. You are now going to take a look at how to use the GitHub API to do some GitHub exploration.

GitHub

Anyone who works with software knows about GitHub and has used it one way or another. You can find most kinds of open source software in GitHub and it is hosted freely. It has become the go-to destination for anyone who dabbles in software.

GitHub provides an API that allows external tools to interact with the services. The API opens up unlimited potential for developers to access the GitHub service to build tools that can provide value for their organization. This allows the proliferation of third-party solutions (free and paid) to be made available to the general public. The Scorecard project in this chapter is one of the tools made possible because of the GitHub API.

GitHub API

There are two kinds of GitHub APIs: REST and GraphQL (https://docs.github.com/en/graphql). There are different projects that implement both APIs, which you will look at a bit later.

The REST-based API offers access like a normal HTTP call. For example, using your own browser you can type in the following address:
https://api.github.com/users/test
You will see the following JSON response in your browser:
{
  "login": "test",
  "id": 383316,
  "node_id": "MDQ6VXNlcjM4MzMxNg==",
  "avatar_url": "https://avatars.githubusercontent.com/u/383316?v=4",
  "gravatar_id": "",
  "url": "https://api.github.com/users/test",
  "html_url": "https://github.com/test",
   ...
  "created_at": "2010-09-01T10:39:12Z",
  "updated_at": "2020-04-24T20:58:44Z"
}
You are seeing information about a username called test that is registered in GitHub. You can try to use your own GitHub username and you will see details about yourself. Let’s get the list of repositories for a particular organization. Type in the following in your browser address:
https://api.github.com/orgs/golang/repos
The address will send the list of repositories that are listed under a particular organization hosted publicly on GitHub. In the example, you want to get the list of repositories hosted under the Golang organization. You will get the following response:
[
  {
    "id": 1914329,
    "node_id": "MDEwOlJlcG9zaXRvcnkxOTE0MzI5",
    "name": "gddo",
    "full_name": "golang/gddo",
    "private": false,
    "owner": {
      "login": "golang",
      "id": 4314092,
      ...
    },
    "html_url": "https://github.com/golang/gddo",
    "description": "Go Doc Dot Org",
    "fork": false,
    ...
    "license": {
      ...
    },
    ...
    "permissions": {
      ...
    }
  },
  { ... }
]

The response is in JSON format. The information you are seeing is the same when you visit the Golang project page at https://github.com/golang. The GitHub documentation at https://docs.github.com/en/rest provides a complete list of REST endpoints that are accessible.

Using the API in a Go application requires you to convert the different endpoints to a function that you can use in your application, which is time consuming, so for this you can use a Go open source library from https://github.com/google/go-github. Let’s run the example of using this library, which can be found inside the chapter8/simple folder. Open your terminal and run it as follows:
go run main.go
You will get the following output:
2022/07/16 18:43:43 {
  "id": 23096959,
  "node_id": "MDEwOlJlcG9zaXRvcnkyMzA5Njk1OQ==",
  "owner": {
    "login": "golang",
    "id": 4314092,
    ...
  },
  "name": "go",
  "full_name": "golang/go",
  "description": "The Go programming language",
  "homepage": "https://go.dev",
  ...
  "organization": {
    "login": "golang",
    "id": 4314092,
    ...
  },
  "topics": [
    "go",
    ...
  ],
  ...
  "license": {
    ...
  },
  ...
}
The sample uses the library to get information about a particular repository, http://github.com/golang/go, which is shown in the following code snippet:
package main
import (
  ...
  "github.com/google/go-github/v38/github"
)
func main() {
  client := github.NewClient(&http.Client{})
  ctx := context.Background()
  repo, _, err := client.Repositories.Get(ctx, "golang", "go")
  ...
  log.Println(string(r))
}

The application starts off by initializing the library by calling github.NewClient(..) and passing in http.Client, which is used to make an HTTP call to GitHub. The library package github.com/google/go-github/v38/github provides all the different functions required. In the example, you use Repositories.Get(..) to obtain information about a particular repository (golang) project (go).

Looking at the library source code (github.com/google/go-github/v38/github/repos.go), you can see that it is performing a similar call to what is defined in the documentation at https://docs.github.com/en/rest/repos/repos#get-a-repository.
func (s *RepositoriesService) Get(ctx context.Context, owner, repo string) (*Repository, *Response, error) {
  u := fmt.Sprintf("repos/%v/%v", owner, repo)
  req, err := s.client.NewRequest("GET", u, nil)
  if err != nil {
     return nil, nil, err
  }
  ...
  return repository, resp, nil
}

You get the same response using https://api.github.com/repos/golang/go in your browser.

The other API that is provided by GitHub is called the GraphQL API (https://docs.github.com/en/graphql) and it is very different from the REST API. It is based on GraphQL (https://graphql.org/), which the website describes as follows:

GraphQL is a query language for APIs and a runtime for fulfilling those queries with your existing data. GraphQL provides a complete and understandable description of the data in your API, gives clients the power to ask for exactly what they need and nothing more, makes it easier to evolve APIs over time, and enables powerful developer tools.

Normally, when using REST API in order to get different kinds of data, you need to get it from different endpoints. Once all of the data is collected, you need to construct them into one structure. GraphQL makes it simple: you just have to define what repository data you want, and it will return the collection of data you requested as one single collection.

This will become clearer when you look at the sample application provided. Open your terminal and run the sample inside chapter8/graphql. Run it as follows:
GITHUB_TOKEN=<your_github_token> go run main.go
You need to use the GitHub token you created previously in the section “Setting Up Scorecard.” On a successful run, you will get the following (the output will differ because the data is obtained from GitHub in real time, which will have changed by the time you run this sample):
2022/07/16 19:39:00 Total number of fork :  15116
2022/07/16 19:39:00 Total number of labels :  10
2022/07/16 19:39:00 ----------------------------------
2022/07/16 19:39:00 Issue title - cmd/cgo: fails with gcc 4.4.1
2022/07/16 19:39:00 Issue title - net: LookupHost is returning odd values and crashing net tests
2022/07/16 19:39:00 Issue title - Problem with quietgcc
2022/07/16 19:39:00 Issue title - Segmentation fault on OS X 10.5 386 for "net" test
2022/07/16 19:39:00 Issue title - HTTP client&server tests fail.  DNS_ServerName and URL_Target strings conjoined into nonsense.
2022/07/16 19:39:00 Issue title - all.bash segfault
2022/07/16 19:39:00 Issue title - Crash when running tests, no tests matching.
2022/07/16 19:39:00 Issue title - go-mode.el breaks when editing empty file
2022/07/16 19:39:00 Issue title - I have already used the name for *MY* programming language
2022/07/16 19:39:00 Issue title - throw: index out of range during all.bash
2022/07/16 19:39:00 ----------------------------------
2022/07/16 19:39:00 Commit author (dmitshur), url (https://github.com/dmitshur)
2022/07/16 19:39:00 Commit author (eaigner), url (https://github.com/eaigner)
2022/07/16 19:39:00 Commit author (nordicdyno), url (https://github.com/nordicdyno)
2022/07/16 19:39:00 Commit author (minux), url (https://github.com/minux)
2022/07/16 19:39:00 Commit author (needkane), url (https://github.com/needkane)
2022/07/16 19:39:00 Commit author (nigeltao), url (https://github.com/nigeltao)
2022/07/16 19:39:00 Commit author (nigeltao), url (https://github.com/nigeltao)
2022/07/16 19:39:00 Commit author (h4ck3rm1k3), url (https://github.com/h4ck3rm1k3)
2022/07/16 19:39:00 Commit author (trombonehero), url (https://github.com/trombonehero)
2022/07/16 19:39:00 Commit author (adg), url (https://github.com/adg)

The output shows the information that is obtained from GitHub from the http://github.com/golang/go repository as the first 10 issues, first 10 comments, and 10 first labels. This kind of information is very useful and you will see as you walk through the code, which is performed easily by using the GraphQL API.

The main part of the GraphQL API is the query that the sample passes to the GitHub endpoint, which looks like the following:
query ($name: String!, $owner: String!) {
  repository(owner: $owner, name: $name) {
    createdAt
    forkCount
    labels(first: 5) {
      edges {
        node {
          name
        }
      }
    }
    issues(first: 5) {
      edges {
        node {
          title
        }
      }
    }
    commitComments(first: 10) {
      totalCount
      edges {
        node {
          author {
            url
            login
          }
        }
      }
    }
  }
}
The query basically describes to GitHub the repository information that you are interested in. It starts off by defining that the query will pass in two parameters ($name and $owner) and the top level of the information that you want is a repository. Inside the repository, you specified that you want the following:
  • createdAt

  • forkCount

  • labels (the first 10 labels)

  • issues (the first 10 issues)

  • commitComments (the first 10 comments)

GitHub provides a GraphQL tool for creating and testing GraphQL, which you will look at in the next section. The GraphQL cannot be used as is inside your code so you need to convert it into a Go struct, as shown in the following snippet:
type graphqlData struct {
  Repository struct {
     CreatedAt githubv4.DateTime
     ForkCount githubv4.Int
     Labels    struct {
        Edges []struct {
           Node struct {
              Name githubv4.String
           }
        }
     } `graphql:"labels(first: $labelcount)"`
     Issues struct {
        Edges []struct {
           Node struct {
              Title githubv4.String
           }
        }
     } `graphql:"issues(first: $issuescount)"`
     CommitComments struct {
        TotalCount githubv4.Int
        Edges      []struct {
           Node struct {
              Author struct {
                 URL   githubv4.String
                 Login githubv4.String
              }
           }
        }
     } `graphql:"commitComments(first: $commitcount)"`
  } `graphql:"repository(owner: $owner, name: $name) "`
  RateLimit struct {
     Cost *int
  }
}

The strict definition uses data types that are defined in the library (e.g., githubv4.String, githubv4.Int, etc.).

Once you have defined the GraphQL definition, you use the GraphQL library. In this case, you use the open source library hosted in https://github.com/shurcooL/githubv4, as shown here:
func main() {
  ...
  data := new(graphqlData)
  vars := map[string]interface{}{
     "owner":       githubv4.String("golang"),
     "name":        githubv4.String("go"),
     "labelcount":  githubv4.Int(10),
     "issuescount": githubv4.Int(10),
     "commitcount": githubv4.Int(10),
  }
  if err := graphClient.Query(context.Background(), data, vars); err != nil {
     log.Fatalf(err.Error())
  }
  log.Println("Total number of fork : ", data.Repository.ForkCount)
  ...
}

The code initializes the graphqlData struct that will be populated with the information received from GitHub by the library and then it makes the call to GitHub using the graphClient.Query(..) function, passing in the newly created struct and variables defined. The variables defined in vars contain the value that will be passed to GitHub as the parameter of the GraphQL.

Once the .Query(..) function returns successfully, you can use the returned data populated inside the data variable and print it out to the console.

In the next section, you will look at how to use GitHub Explorer to work with GraphQL.

GitHub Explorer

GitHub Explorer is a web-based tool provided by GitHub to allow developers to query GitHub repositories for information. The tool is available from https://docs.github.com/en/graphql/overview/explorer. You must sign in with your GitHub account before using the tool. Once access has been granted, you will see Explorer, as shown in Figure 8-13.

A screenshot of the user interface of GitHub docs has an overview option, resources limitation, breaking changes explorer that includes codes with comments, queries, views, and log-in.

Figure 8-13

GitHub Explorer

Once you are logged in, try the following GraphQL and click the run An icon of pause. button.
{
  repository(owner: "golang", name: "go") {
    createdAt
    diskUsage
    name
  }
}
It queries GitHub for repository http://github.com/golang/go to extract creation date, total disk usage, and the name of the project. You will get response as follows:
{
  "data": {
    "repository": {
      "createdAt": "2014-08-19T04:33:40Z",
      "diskUsage": 310019,
      "name": "go"
    }
  }
}
Explorer provides quick tips of what data you can add to the query. This can be shown when you create a new line inside the query and hit Alt + Enter. It will display a scrollable tooltip like in Figure 8-14.

Two screenshots of GitHub docs. It has code that includes the owners name, created At, disk usage, and name.

Figure 8-14

Smart tool tip

For more reading on the different data that can be extracted using GraphQL, refer to the queries documentation at https://docs.github.com/en/graphql/reference/queries.

Summary

In this chapter, you looked at an open source project called Scorecard that provides security metrics for projects hosted on GitHub. The project measures the security of a project on a scale of 0-10 and this can also be used for projects stored locally. The major benefit of the tool is the public availability of data for projects that have been scanned by the tool. This data is useful for developers because it gives them information and insights on the security metrics of projects they are planning to use.

You also looked at how the tool works and learned how to use the GitHub API to extract repository information to perform predefined security checks.

You learned in detail about the different availability of the GitHub APIs, which are REST and GraphQL. You looked at the sample code to understand how to use each of these APIs to extract information from a GitHub repository. Finally, you explore the GitHub Explorer to understand how to construct GraphQL queries for performing query operations on GitHub.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.224.38.99