© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2023
N. TolaramSoftware Development with Gohttps://doi.org/10.1007/978-1-4842-8731-6_14

14. CrowdSec

Nanik Tolaram1  
(1)
Sydney, NSW, Australia
 
In this chapter, you will look at an open source security tool called CrowdSec (https://github.com/crowdsecurity/crowdsec). There are few reasons why this tool is interesting to study:
  • It uses crowd-sourced data to collect IP information across the globe, which is shared with the community.

  • It offers code designs that are useful to look at and learn from

  • The GeoIP database is interesting on its own.

The chapter is broken down into the installation part and the learning part. In the installation part, you will look at installing CrowdSec to understand how it works. In the learning section, you will look deeply into how CrowdSec implements something that you can learn from by looking at sample code.

Source Code

The source code for this chapter is available from the https://github.com/Apress/Software-Development-Go repository.

CrowdSec Project

The documentation at https://doc.crowdsec.net/docs/intro explain it nicely:

CrowdSec is an open-source and lightweight software that allows you to detect peers with malevolent behaviors and block them from accessing your systems at various levels (infrastructural, system, applicative).

CrowdSec, as an open source security tool, provides quite a number of features that sit nicely in a cloud environment. The thing that is intriguing about the tool is the data that is collected by the community. This crowd-sourced data allows CrowdSec to determine whether a certain IP address has to be banned or should be allowed into your infrastructure.

There are many architectures and code designs that you are going to learn from the project, which you will explore more in the “Learning From CrowdSec” section.

Using CrowdSec

I will not go through the complete installation process of CrowdSec. Rather, I will cover the steps of a bare minimum installation that will allow you to understand what you need for the section “Learning From CrowdSec.” The objective of this installation is to get to a point to see the community data that is collected by a central server replicated to a local database.

Create an empty directory to do the following steps. In my local installation, I created a new directory under /home/nanik/GolandPojects/crowdsec. Follow these steps:
  • Download the release from GitHub. For this section, use v1.4.1 for Linux, downloading it using the following command:
    wget https://github.com/crowdsecurity/crowdsec/releases/download/v1.4.1/crowdsec-release.tgz
  • Once downloaded, use gunzip and tar to unzip as follows:
    gunzip ./crowdsec-release.tgz && tar -xvf crowdsec-release.tar
  • A new directory named crowdsec-v1.4.1 will be created, as shown:
    └── crowdsec-v1.4.1
        ├── cmd
        ├── config
        ├── plugins
        ├── test_env.ps1
        ├── test_env.sh
        └── wizard.sh
  • Change your directory to crowdsec-v1.4.1 and run the test_env.sh command.

./test_env.sh
Let the script run. It will take a bit of time because it’s downloading a few things. You will see output that looks like the following:
[07/27/2022:03:50:14 PM][INFO] Creating test arboresence in /home/nanik/GolandProjects/crowdsec/crowdsec-v1.4.1/tests
[07/27/2022:03:50:14 PM][INFO] Arboresence created
[07/27/2022:03:50:14 PM][INFO] Copying needed files for tests environment
[07/27/2022:03:50:15 PM][INFO] Files copied
...
INFO[27-07-2022 03:50:15 PM] Machine 'test' successfully added to the local API
INFO[27-07-2022 03:50:15 PM] API credentials dumped to '/home/nanik/GolandProjects/crowdsec/crowdsec-v1.4.1/tests/config/local_api_credentials.yaml'
INFO[27-07-2022 03:50:15 PM] Wrote new 438269 bytes index to /home/nanik/GolandProjects/crowdsec/crowdsec-v1.4.1/tests/config/hub/.index.json
INFO[27-07-2022 03:50:16 PM] crowdsecurity/syslog-logs : OK
INFO[27-07-2022 03:50:16 PM] Enabled parsers : crowdsecurity/syslog-logs
INFO[27-07-2022 03:50:16 PM] crowdsecurity/geoip-enrich : OK
INFO[27-07-2022 03:50:16 PM] downloading data 'https://crowdsec-statics-assets.s3-eu-west-1.amazonaws.com/GeoLite2-City.mmdb' in '/home/nanik/GolandProjects/crowdsec/crowdsec-v1.4.1/tests/data/GeoLite2-City.mmdb'
INFO[27-07-2022 03:51:25 PM] downloading data 'https://crowdsec-statics-assets.s3-eu-west-1.amazonaws.com/GeoLite2-ASN.mmdb' in '/home/nanik/GolandProjects/crowdsec/crowdsec-v1.4.1/tests/data/GeoLite2-ASN.mmdb'
INFO[27-07-2022 03:51:41 PM] Enabled parsers : crowdsecurity/geoip-enrich
INFO[27-07-2022 03:51:41 PM] crowdsecurity/dateparse-enrich : OK
INFO[27-07-2022 03:51:41 PM] Enabled parsers : crowdsecurity/dateparse-enrich
...
The script creates a new directory called tests containing a complete test environment for CrowdSec. The directory will look like the following:
nanik@nanik:~/GolandProjects/crowdsec/crowdsec-v1.4.1$ tree -L 2 ./tests/
./tests/
├── config
│   ├── acquis.yaml
│   ├── collections
│   ├── crowdsec-cli
│   ├── hub
...
│   ├── scenarios
│   └── simulation.yaml
├── crowdsec
├── cscli
├── data
│   ├── crowdsec.db
│   ├── GeoLite2-ASN.mmdb
│   └── GeoLite2-City.mmdb
├── dev.yaml
├── logs
└── plugins
    ├── notification-email
...
    └── notification-splunk

The directory contains a variety of files including the CrowdSec command line tools crowdsec and cscli along with a folder called data that you will look at in the next section in more detail. The database with extension .mmdb is the database that you will look at in detail in the “GeoIP Database” section.

crowdsec.db

CrowdSec stores data inside a SQLite database called crowdsec.db. The database contains a number of tables, shown in Figure 14-1.

A screenshot of the crowd's database includes the main potion, tables, alerts, bouncers, decisions, events, machines, meta, SQLite slash master, SQLite slash sequence, and server objects.

Figure 14-1

CrowdSec database

The test environment does not populate any data when the database is created, so you need to set up your environment so that it will sync from a central server. To do this, you need to register first with the CrowdSec server using the cscli tool, as outlined in the doc at https://docs.crowdsec.net/docs/cscli/cscli_capi_register/. Open terminal and change to the tests directory, and execute the following command:
./cscli capi register -c ./dev.yaml
You will get output like the following:
WARN[27-07-2022 04:10:11 PM] can't load CAPI credentials from './config/online_api_credentials.yaml' (missing field)
INFO[27-07-2022 04:10:11 PM] push and pull to Central API disabled
INFO[27-07-2022 04:10:13 PM] Successfully registered to Central API (CAPI)
INFO[27-07-2022 04:10:13 PM] Central API credentials dumped to './config/online_api_credentials.yaml'
...
Using the cscli command tool, you must register to a central server. online_api_credentials.yaml is populated with the registration details, which look like the following:
url: https://api.crowdsec.net/
login: <login_details>
password: <password>
You are now ready to populate your database with the central server. Use the following command:
./crowdsec -c ./dev.yaml
You will see output that looks like the following:
...
INFO[27-07-2022 16:16:45] Crowdsec v1.4.1-linux-e1954adc325baa9e3420c324caabd50b7074dd77
WARN[27-07-2022 16:16:45] prometheus is enabled, but the listen address is empty, using '127.0.0.1'
WARN[27-07-2022 16:16:45] prometheus is enabled, but the listen port is empty, using '6060'
INFO[27-07-2022 16:16:45] Loading prometheus collectors
INFO[27-07-2022 16:16:45] Loading CAPI pusher
INFO[27-07-2022 16:16:45] CrowdSec Local API listening on 127.0.0.1:8081
INFO[27-07-2022 16:16:45] Start push to CrowdSec Central API (interval: 30s)
INFO[27-07-2022 16:16:45] Start pull from CrowdSec Central API (interval: 2h0m0s)
INFO[27-07-2022 16:16:45] Loading grok library /home/nanik/GolandProjects/crowdsec/crowdsec-v1.4.1/tests/config/patterns
INFO[27-07-2022 16:16:46] Loading enrich plugins
INFO[27-07-2022 16:16:46] Successfully registered enricher 'GeoIpCity'
...
INFO[27-07-2022 16:16:46] Loading parsers from 4 files
...
INFO[27-07-2022 16:16:47] capi metrics: metrics sent successfully
INFO[27-07-2022 16:16:47] Start send metrics to CrowdSec Central API (interval: 30m0s)
INFO[27-07-2022 16:16:54] capi/community-blocklist : 0 explicit deletions
INFO[27-07-2022 16:17:15] crowdsecurity/community-blocklist : added 8761 entries, deleted 0 entries (alert:1)

Notice the last log message that says added 8761 entries, which means that it has added 8761 entries into your database. If you are not getting this message, rerun the crowdsec command.

Looking into the decisions table, you will the populated data, as shown in Figure 14-2

A screenshot of the decision table show categories according to I P address, data until when a particular I P is banned, and scenarios when an I P address is detected.

Figure 14-2

Data inside the decisions table

The table contains interesting information:
  • IP addresses that are banned

  • Date until when a particular IP is banned

  • Scenarios when an IP address is detected

You have learned briefly how to set up CrowdSec and you have seen the data it uses. In the next section, you will look at parts of CrowdSec that are interesting. You will look at how certain things are implemented inside CrowdSec and then look at a simpler code sample of how to do it.

Learning From CrowdSec

CrowdSec as a project is quite complex and it contains a lot of different things that are very interesting to learn from. In this section, you will pick up a few topics that are used inside CrowdSec that are useful to learn. These topics can also be applied when designing your own software with Go.

System Signal Handling

As a system, CrowdSec provides an extensive list of features that are broken down into several different modules. The reason for features to be broken down into modules is to make it easy for development, maintenance, and testing. When building a system, one of the key things to remember is to make sure all the different modules can be gracefully terminated and all resources such as memory, network connections, and disk space are released. To make sure that different parts of the system shut down properly, you need some sort of coordinated communication to understand when modules need to prepare for the shutdown process.

Imagine a scenario where you are designing an application and it is terminated by the operating system because of some resource constraint. The application must be aware of this and have the capability to shut down all the different modules independently before shutting itself down permanently. You will look at an example on how this is done using the code sample in the chapter14/signalhandler folder.

Open your terminal and run the sample as follows:
go run main.go
The application will keep on running, printing out loop messages on the terminal until you stop it by hitting Ctrl+C to stop. Then it will print out the following:
2022/07/24 22:31:32 loop1000Times -  0
2022/07/24 22:31:32 loop100Times -  0
2022/07/24 22:31:32 loop100Times -  1
...
2022/07/24 22:31:32 loop1000Times -  14
2022/07/24 22:31:33 loop1000Times -  15
2022/07/24 22:31:33 loop100Times -  15
^C2022/07/24 22:31:33 SIGTERM received
2022/07/24 22:31:33 loop1000Times - quit
2022/07/24 22:31:33 loop100Times - quit
2022/07/24 22:31:33 Complete!
The application successfully shuts down gracefully because the Ctrl+C key combo is detected. Before going through the code, Figure 14-3 shows the app design. Use Figure 14-3 as guidance when you walk through the sample code.

A flowchart of the Crowdsec system starts from main.go then register signal channel after that classified into a loop in 100 and 1000 times function then wait for function.

Figure 14-3

CrowdSec system signal handling

The following code snippet shows the registration of system interruption events using Go’s built-in os/signal package (step 1). The function signal.Notify(..) is called, passing in the signals that will be registered to listen to. In the sample code, you register SIGHUP, SIGTERM, and SIGINT.
func main() {
  signalChan := make(chan os.Signal, 1)
  signal.Notify(signalChan,
     syscall.SIGHUP,
     syscall.SIGTERM,
     syscall.SIGINT)
  ...
  go func() {
     for {
        s := <-signalChan
        switch s {
        case syscall.SIGHUP, syscall.SIGINT, syscall.SIGTERM:
...
        }
     }
  }()
  ...
}
The following explains the meaning of the signals:
  • SIGHUP: The operating system sends this signal when the terminal used to execute the application is disconnected, closed, or broken.

  • SIGTERM: This is a generic signal that is used by the operating system to signal terminating a process or application.

  • SIGINT: This is also referred to as a program interrupt and this signal occurs when the Ctrl+C combination is detected.

The code listens to all these signals to ensure that if any of them are detected, it will do its job to shut itself down properly.

The signalChan variable is a channel that accepts os.Signal and it is passed as parameter when calling signal.Notify(). The goroutine takes care of handling the signal received from the library in a for{} loop (step 2). Receiving a signal (step 6) means that there is an interruption, so the code must take the necessary steps to start the shutdown process (step 7).

Now that the code is ready to receive the system event and it knows what it is supposed to when it receives it, let’s take a look at how other modules/goroutines are informed about this. The sample code spawns two goroutines, as shown here:
func main() {
  ...
  wg.Add(2)
  go loop100Times(stop, &wg)
  go loop1000Times(stop, &wg)
  wg.Wait()
  log.Println("Complete!")
}
loop100Times and loop1000Times are called as goroutines (step 3) and are passed two parameters, stop and wg. The stop variable is a channel variable that is used by the goroutine function to know when it needs to stop processing. The following code snippet shows the code that closes the stop channel:
func main() {
  ...
  go func() {
     for {
        ...
        switch s {
        case syscall.SIGHUP, syscall.SIGINT, syscall.SIGTERM:
           ...
           close(stop)
           ...
        }
     }
  }()
  ...
}
The close(stop) function closes the channel, and any part of the application that is checking for this channel will detect there is activity happening on the channel and will act on it. The checking of the stop channel can be seen in the following code snippet:
func loop100Times(stop <-chan string, wg *sync.WaitGroup) {
  ...
  for {
     select {
     case <-stop:
        log.Println("loop100Times - quit")
        return
     default:
        ...
     }
  }
}

The loop100Times function runs inside a for{} loop where it checks the channel condition inside the select{} statement. To make it easy to understand, basically the for{ select {} } block of code translate to the following:

Keep on doing the for loop, and on every loop check do the following:
  • Is there any value to read from the stop channel? if there is something, processes must stop.

  • Otherwise, just print to the console and increment the counter.

The same logic is used inside the loop1000Times function, so it works exactly the same. Both functions will stop processing and will print the counter value to the terminal once the stop channel is closed. The application has achieved the state of shutting down itself gracefully by informing the different parts of the code that it is shutting down.

The last thing you are going to look at is the wait state (step 4). Now the different goroutines know when to shut down, but the application can only completely shut itself down after all the goroutines complete their processes. This is made possible by the use of sync.WaitGroup. The following code snippet shows the usage of WaitGroup:
package main
import (
  ...
)
func main() {
  ...
  var wg sync.WaitGroup
  ...
  wg.Add(2)
  go loop100Times(stop, &wg)
  go loop1000Times(stop, &wg)
  wg.Wait()
  log.Println("Complete!")
}
func loop100Times(stop <-chan string, wg *sync.WaitGroup) {
  ...
  defer wg.Done()
  for {
     ...
  }
}
func loop1000Times(stop <-chan string, wg *sync.WaitGroup) {
  ...
  defer wg.Done()
  for {
     ...
  }
}

Handling Service Dependencies

Complex applications like CrowdSec have multiple services that run at the same time or at scheduled times. In order for services to run properly, there needs to be service coordination that takes care of the dependencies between services.

Figure 14-4 shows how the service coordination is done inside CrowdSec using the channel.

A flowchart of the service coordinator starts from the main.go then startRunSvc after that classified into two categories named serve, serve Prometheus, serveAPIServer, and wait for the signal.

Figure 14-4

Service coordination

In Figure 14-4, the apiReady channel is the central part of the service coordination when CrowdSec starts up. The diagram shows that the apiServer.Run function sends a signal to the apiReady channel, which allows the other service, servePrometheus, to run the server listening on port 6060.

The following code snippet shows the StartRunSvc function running servePrometheus as a goroutine and passing in the apiReady channel, and it also pass the same channel when the Serve function is called:
package main
import (
  "os"
  ...
)
func StartRunSvc() error {
  ...
  apiReady := make(chan bool, 1)
  agentReady := make(chan bool, 1)
  // Enable profiling early
  if cConfig.Prometheus != nil {
     ...
     go servePrometheus(cConfig.Prometheus, dbClient, apiReady, agentReady)
  }
  return Serve(cConfig, apiReady, agentReady)
}
The servePrometheus function starts running the server to listen on port 6060 only when it is able to read the value from the apiReady channel (<- apiReady), as shown in the following snippet:
func servePrometheus(config *csconfig.PrometheusCfg, dbClient *database.Client, apiReady chan bool, agentReady chan bool) {
  ...
  <-apiReady
  ...
  if err := http.ListenAndServe(fmt.Sprintf("%s:%d", config.ListenAddr, config.ListenPort), nil); err != nil {
     log.Warningf("prometheus: %s", err)
  }
}
The apiReady channel is set only when the CrowdSec API server has been run successfully, as shown in the following code snippet. The serveAPIServer function spawns off another goroutine when calling the apiServer.Run(..) function, where it sends a value to the apiReady channel where the API server starts up.
func serveAPIServer(apiServer *apiserver.APIServer, apiReady chan bool) {
  apiTomb.Go(func() error {
     ...
     go func() {
        ...
        if err := apiServer.Run(apiReady); err != nil {
           log.Fatalf(err.Error())
        }
     }()
     ...
  })
}
func (s *APIServer) Run(apiReady chan bool) error {
  ...
  s.httpServerTomb.Go(func() error {
     go func() {
        apiReady <- true
        ...
     }()
     ...
  })
  return nil
}
Let’s take a look at a simpler version of the service coordination, which is in the chapter14/services folder. The sample code demonstrates how to use service coordination between two different services, serviceA and serviceB. Open up terminal and make sure you are in the correct chapter14/services directory and run the code as follows:
go run main.go
You will get output like the following:
2022/07/26 20:40:20 ....Starting serviceB
2022/07/26 20:40:21 ....Done with serviceB
2022/07/26 20:40:21 ..Starting serviceA
2022/07/26 20:40:23 ..Done with serviceA
Since the code is running inside a goroutine, the output sequence printed on your console will vary; however, the service will be run correctly. The following code shows the code that runs the service as a goroutine:
func main() {
  serviceBDone := make(chan bool, 1)
  alldone := make(chan bool, 1)
  go serviceB(serviceBDone)
  go serviceA(serviceBDone, alldone)
  <-alldone
}
There are two channels created by the sample app. Let’s take a look the function of each channel:
  • serviceBDone: This channel is used to inform that serviceB has done its job.

  • alldone: This channel is used to inform that serviceA has done its job so the application can exit.

The following code snippet shows the serviceA and serviceB functions:
func serviceB(serviceBDone chan bool) {
  ...
  serviceBDone <- true
  log.Println("....Done with serviceB")
}
//2nd service
func serviceA(serviceBDone chan bool, finish chan bool) {
  <-serviceBDone
  ...
  log.Println("..Done with serviceA")
  finish <- true
}

GeoIP Database

CrowdSec uses a GeoIP database that contains geographical information of an IP address. This database is downloaded as part of setting up the test environment discussed in the “Using CrowdSec” section.

In this section, you will look into this database and learn how to read the data from the database. One of the use cases for this database is the ability to build a security tool for your infrastructure to label each incoming IP, which is useful to monitor and understand the incoming traffic to your infrastructure. The GeoIP database comes from the following website: https://dev.maxmind.com/geoip/geolite2-free-geolocation-data?lang=en#databases. Have a read through the website to get an understanding of the licensing

The sample code is inside the chapter14/geoip/city folder, but before running it, you need to specify the location of the GeoIP database that the code will use. If you followed the “Using CrowdSec” section, you will have a database file called GeoLite2-City.mmdb database inside the data folder. Copy the location of the file to use it inside the snippet, as shown below. My file location is shown in the code snippet.
package main
...
func main() {
  db, err := maxminddb.Open("/home/nanik/GolandProjects/cloudprogramminggo/chapter14/geoip/city/GeoLite2-City.mmdb")
  ...
}
Once the file location has been specified, open terminal and run the sample as follows:
go run main.go
You will see output like the following:
IP : 2.0.0.0/17, Long : 2.338700, Lat : 48.858200, Country : FR, Continent: EU
IP : 2.0.128.0/19, Long : -0.947200, Lat : 47.171600, Country : FR, Continent: EU
...
IP : 2.0.192.0/18, Long : 2.338700, Lat : 48.858200, Country : FR, Continent: EU
IP : 2.1.0.0/19, Long : 2.338700, Lat : 48.858200, Country : FR, Continent: EU
IP : 2.1.32.0/19, Long : 2.302200, Lat : 44.858601, Country : FR, Continent: EU
...

The code reads the database to get all IP addresses in the 2.0.0.0 IP range and prints all the IP addresses found in that range along with other country- and continent-related information. Let’s go through the code and understand how it uses the database.

The data is stored in a single file, which is efficiently packed together, so in order to read the database, you must to use another library. Use the github.com/oschwald/maxminddb-golang library. The documentation of the library can be found at https://pkg.go.dev/github.com/oschwald/maxminddb-golang.

The library provides a function to convert the data into a struct. In the sample code, you create your own struct to represent the data that will be read.
package main
...
type GeoCityRecord struct {
  Continent struct {
     Code      string                 `json:"code"`
     GeonameId int                    `json:"geoname_id"`
     Names     map[string]interface{} `json:"names"`
  } `json:"continent"`
  Country struct {
     GeonameId int                    `json:"geoname_id"`
     IsoCode   string                 `json:"iso_code"`
     Names     map[string]interface{} `json:"names"`
  } `json:"country"`
  Location struct {
     AccuracyRadius int     `json:"accuracy_radius"`
     Latitude       float32 `json:"latitude"`
     Longitude      float32 `json:"longitude"`
     TimeZone       string  `json:"time_zone"`
  } `json:"location"`
  RegisteredCountry struct {
     GeoNameID int                    `json:"geoname_id"`
     IsoCode   string                 `json:"iso_code"`
     Names     map[string]interface{} `json:"names"`
  } `json:"registered_country"`
}
func main() {
  ...
}
The GeoCityRecord struct will be populated when calling the library to read the data, as shown here:
package main
import (
  ...
)
...
func main() {
  ...
  _, network, err := net.ParseCIDR("2.0.0.0/8")
 ...
  for networks.Next() {
     var rec interface{}
     r := GeoCityRecord{}
     ip, err := networks.Network(&rec)
     ...
}

networks.Next() loops through the records found and reads all geographical information from the database by calling the networks.Network(..) function, which populates the rec variable.

The rec variable is an interface, so the code uses json.Marshal(..) to marshal the content into a proper struct, defined by the r variable, as shown here:
package main
...
func main() {
  ...
  for networks.Next() {
     var rec interface{}
     r := GeoCityRecord{}
     ip, err := networks.Network(&rec)
     ...
     j, _ := json.Marshal(rec)
     err = json.Unmarshal([]byte(j), &r)
     ...
     fmt.Printf("IP : %s, Long : %f, Lat : %f, Country : %s, Continent: %s ", ip.String(), r.Location.Longitude, r.Location.Latitude,
        r.Country.IsoCode, r.Continent.Code)
  }
}

Once the JSON has been unmarshalled back to the r variable, the code prints out the information into the console.

Summary

In this chapter, you not only looked at the crowd source nature of data collection used by CrowdSec and how the community benefits from it, you also learned how to use it in your application.

You learned how to use channels to inform applications when system signals are sent by the operating system. You also looked at using channels to handle service dependencies during startup. Lastly, you looked at how to read a GeoIP database, which is useful to know when you want to use the information in your infrastructure for logging or monitoring IP traffic purposes.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.15.143.40