Chapter 1. Introducing Falco

Now it’s time to understand Falco a little bit better. Don’t worry, we’ll take this easy! We will first look at what Falco does, including a high-level view on its functionality and an introductory description of each of its components. We’ll follow that with the explanation of design principles that inspired Falco and still guide its development. After that, we’ll explain what you can do with Falco, what is outside its domain and what you can better accomplish with other tools. Finally, we’ll learn some historical context that will be useful to put things into perspective and understand the motivation that drove the development of the tool.

Falco in a Nutshell

At the highest level, Falco is pretty straightforward: you deploy it by installing multiple sensors across a distributed infrastructure. Each sensor collects data (from the local machine or by talking to some API), runs a set of rules against it and notifies you if something bad happens.

Take a look at Figure 1-1 for a high-level diagram of how it works.

Falco s high level architecture
Figure 1-1. Falco’s high-level architecture

You can think about Falco as a network of security cameras for your infrastructure: you place the sensors in key locations, they observe for you what’s going on and they ping you if they detect harmful behavior. With Falco, bad behavior is defined by a set of rules that the community created and maintains for you, and that you can customize for your needs.

Sensors

The architecture of a Falco sensor is shown in Figure 1-2.

Falco sensor architecture
Figure 1-2. Falco sensor architecture

The sensor consists of an engine that has two inputs: a data source and a set of rules. The sensor applies the rules to each event coming from the data source. When a rule matches an event, an output message is produced. Very straightforward, right?

Data Sources

Each sensor is able to collect input data from a number of sources. Originally, Falco was designed to exclusively operate on system calls, which to date remain one of its most important data sources. We’ll cover system calls in detail in chapters 4 and 5, but for the moment, think about them as what a running program uses to interface with its external world: opening or closing a file, establishing or receiving a network connection, reading and writing data to the disk or to the network, executing commands, communicating with other processes using pipes or other types of inter process communication, these are all examples of system call usage.

Falco collects system calls by instrumenting the kernel of the Linux operating system. It can do it in two different ways: deploying a kernel module, i.e., a binary that can be installed in the operating system kernel to extend the kernel’s functionality, or using a technology called eBPF, which allows running scripts that safely perform actions inside the OS. We’ll talk extensively about kernel modules and eBPF in Chapter 5.

Tapping into this data gives Falco incredible visibility into everything that is happening in your infrastructure. Here are some examples of things Falco can detect for you:

  • Privilege escalations

  • Access to sensitive data

  • Ownership and Mode changes

  • Unexpected network connections or socket mutations

  • Unwanted program execution

  • Data exfiltration

  • Compliance violations

But Falco is not limited to system calls. It has been extended to tap into other data sources (we’ll show you examples throughout the book). For example, Falco can monitor your cloud logs in real time and notify you when something bad happens in your cloud infrastructure. Here are some more examples of things it can detect for you:

  • A user logs in without multi factor authentication

  • A cloud service configuration is modified

  • Somebody accesses one or more sensitive files in an S3 bucket

New data sources are added to Falco frequently, so we recommend checking the website and Slack channel to keep up with what’s new.

Rules

Rules tell the Falco engine what to do with the data coming from the sources. They allow the user to define policies in a compact and readable format. Falco comes pre-loaded with a comprehensive set of rules that cover host, container, Kubernetes and cloud security. However, you can easily create your own rules to customize Falco. We’ll spend a lot of time on rules, in particular in Chapter 14; by the time you’re done reading this book, you’ll be a total master at them. For the moment, here’s an example to whet your appetite:

- rule: shell_in_container
  desc: shell opened inside a container
  condition: container.id != host and proc.name = bash
  output: shell in a container (user=%user.name container_id=%container.id)
  priority: WARNING

This rule detects when a bash shell is started inside a container, which is normally not a good thing in an immutable container-based infrastructure. The core fields in a rule are the condition, which tells Falco what to look at, and the output, which is what Falco will tell you when the condition triggers. As you can note, both the condition and the output act on fields, one of the core concepts in Falco. The condition is a boolean expression that combines checks of fields against values, essentially a filter. The output is a print-like line where field values can be printed out.

Does this remind you of networking tools, like tcpdump or Wireshark? Good eye: they were a big inspiration for Falco.

Data Enrichment

Rich data sources and a flexible rule engine help make Falco a powerful runtime security tool. On top of that, metadata from a disparate set of sources enriches its detections.

When Falco tells you that something has happened, for example that a system file has been modified, you typically need more information to understand the cause and the scope of the issue. Which process did that? Did it happen in a container? If so, what were the container and image names? What was the service/namespace where this happened? Was it in production or in dev? Was this a change made by root?

Falco’s data enrichment engine helps answer all of these questions by automatically attaching context to detections. It also lets you express much richer rule conditions that include this metadata: for example, you can easily scope a rule so that it only triggers in production or in a specific service. Falco accomplishes this by integrating with Kubernetes, keeping track of what runs in the cluster, and letting you use this information in rule conditions and outputs.

Outputs

Every time a rule is triggered, the corresponding engine emits an output notification. In the simplest possible configuration, the engine writes the notification on standard output (which, as you can imagine, usually isn’t very useful). Fortunately, Falco offers sophisticated ways to route outputs and direct them to a bunch of places, including log collection tools, cloud storage services like S3, and communication tools like Slack and email. Its ecosystem includes a fantastic project called Falco Sidekick, specifically designed to connect Falco to the world and make output collection effortless (see Chapter 13 for more).

Containers and More

Falco was designed for the modern world of Cloud Native applications, so it has excellent, out-of-the-box support for containers, Kubernetes, and the cloud. Since this book is about Cloud Native, we will mostly focus on that, but keep in mind that Falco is not limited to containers and Kubernetes running in the cloud. You can absolutely use it as a host security tool, and many of its preloaded rules can help you secure your fleet of Linux servers. Falco has also good support for network detections, allowing you to inspect the activity of connections, IP addresses, ports, clients and servers and receive alerts when they show unwanted behavior.

Falco’s Design Principles

Now that you understand what Falco does, let’s talk about why it is the way it is.

When you’re developing a piece of software of non-negligible complexity, it’s important to focus on the right use cases and prioritize the most important goals. Sometimes that means accepting tradeoffs. Falco is no exception. Its development has been guided by a core set of principles. In this section we will learn why they were chosen and how each of them reflects on Falco’s architecture and feature set. Understanding these principles will allow you to judge if Falco is a good fit for your use cases and help you get the most out of it.

Specialized for Runtime

The Falco engine is designed to detect threats while your services and applications are running. When it detects unwanted behavior, Falco should alert you instantly (at most in a matter of seconds), so you can be informed (and react!) right away, not after minutes or hours.

This design principle manifests in three important architectural choices. First, the Falco engine is engineered as a streaming engine, able to process data quickly as it arrives rather than storing it and acting on it later. Second, it’s is designed to rely on limited state, which means correlating different events, even if feasible, is not a primary goal and is discouraged. Third, it evaluates rules as close as possible to the data source. If possible, Falco avoids transporting information before processing it and favors deploying richer engines on the endpoints.

Suitable for Production

You should be able to deploy Falco in any environment, including production environments where stability and low overhead are of paramount importance. It should not crash your apps and should strive to slow them down as little as possible.

This design principle affects the data collection architecture, particularly when Falco runs on endpoints that have many processes or containers. The Falco kernel module and eBPF probe have undergone many iterations and years of testing to guarantee their performance and stability. Collecting data by tapping into the kernel of the operating system, as opposed to instrumenting the monitored processes/containers, guarantees that your applications won’t crash because of bugs in Falco.

The Falco engine is written in C++ and employs many expedients to reduce resource consumption. For example, it avoids processing system calls that read or write disk or network data. In some ways this is a limitation, because it prevents users from creating rules that inspect the content of payloads, but it also ensures that CPU and memory consumption stay low, which is more important.

Optimized to Run at the Edge

Compared to other policy engines (for example, OPA), Falco has been explicitly designed with a distributed, multisensor architecture in mind. Its sensor is designed to be lightweight, efficient, and portable, and to operate in diverse environments. It can be deployed on a physical host, in a virtual machine, or as a container. The Falco binary is built for multiple platforms, including ARM.

Avoids Moving and Storing a Ton of Data

Most currently marketed threat-detection products are based on sending a lot of events to a centralized Security information and event management (SIEM) tool and then performing analytics on top of the collected data.

Falco is designed around a very different principle: stay as close as possible to the endpoint, perform detections in place, and only ship alerts to a centralized collector. This approach results in a solution that is a bit less capable at performing complex analytics, but is simple to operate, much more cost-effective and scales very well horizontally.

Scalable

Speaking of scale: another important design goal underlying Falco is to be able to scale to support the biggest infrastructures in the world. If you can run it, Falco should be able to secure it.

As we’ve just described, limited state and avoiding centralized storage are important elements of this. Edge computing, too, since distributing rule evaluation is the only approach to scale a tool like Falco in a truly horizontal way.

Another important element of scalability is endpoint instrumentation. Falco’s data collection doesn’t use techniques like sidecars, library linking, or process instrumentation. The reason is that the resource utilization of all of these techniques grows with the number of containers, libraries, or processes to monitor. Busy machines have many containers, libraries, and processes, too many for these techniques to work—but they have only one operating system kernel. Capturing system calls in the kernel means that you only need one Falco sensor per machine, no matter how big the machine is. This makes it possible to run Falco on big hosts with a lot of activity.

Truthful

One other benefit of using system calls as a data source? System calls never lie. Falco is hard to evade because the mechanism it uses to collect data is very hard to disable or circumvent. If you try to evade or circumvent it, you will leave traces that Falco can capture.

Sane Defaults, Richly Extensible

Another key design goal was minimizing the time it takes to extract value from Falco by just installing it; you shouldn’t need to customize it unless you have advanced requirements.

Whenever the need for customization does arise, though, Falco offers flexibility. For example, you can create new rules through a rich and expressive syntax, develop and deploy new data sources that expand the scope of detections, and integrate Falco with your desired notification and event collection tools.

Simple

Simplicity is the last design choice, but it’s also one of the most important ones.

The Falco rule syntax is designed to be compact, easy to read, and simple to learn. Whenever possible, a Falco rule condition should fit in a single line. Anyone, not only experts, should be able to write a new rule or modify an existing one. It is OK if this reduces the expressiveness of the syntax: Falco is in the business of delivering an efficient security rule engine, not a full-fledged domain-specific language. There are better tools for that.

Simplicity is also evident in the processes of extending Falco to alert on new data sources and of integrating it with a new cloud service or type of container, which is a matter of writing a plugin in any language, including Go, C, and C++. Falco loads these plugins easily.

What You Can Do with Falco

Falco shines at detecting threats, intrusions and data theft at runtime and in real time. It works well with legacy infrastructures but excels at supporting containers, Kubernetes and cloud infrastructures. It secures both workloads (processes, containers, services) and infrastructure (hosts, VMs, network, cloud infrastructure and services). It is designed to be lightweight, efficient and scalable and to be used in both development and production. It can detect many classes of threats, but should you need more, you can customize it. It also has a thriving community that supports it and keeps enhancing it.

What You Cannot Do with Falco

No single tool can solve all your problems. Knowing what you cannot do with Falco is as important as knowing where to use it.

As with any tool, there are tradeoffs. First of all, Falco is not a general-purpose policy language: it doesn’t offer the expressiveness of a full programming language and cannot perform correlation across different engines. Its rule engine, instead, is designed to apply relatively stateless rules at high frequency in many places around your infrastructure. If you are looking for a powerful centralized policy language, we suggest you take a look at OPA.

Second, Falco is not designed to store the data it collects in a centralized repository and let you perform analytics on it. Rule validation is performed at the endpoint and only the alerts are sent to a centralized location. If your focus is advanced analytics and big data querying, we recommend that you use one of the many log collection tools available on the market.

Finally, for efficiency reasons, Falco does not inspect network payloads. Therefore, it’s not the right tool to implement layer 7 (L7) security policies. A traditional network-based intrusion detection system (IDS) or L7 firewall is a better choice for such a use case.

Background and History

The authors of this book have been part of some of Falco’s history, and this final section brings you our memories and perspectives. If you are only interested in operationalizing Falco, feel free to skip the rest of this chapter. However, we believe that knowing where Falco comes from can give you useful context for its architecture that will ultimately help you use it better. Plus, it’s a fun story!

Network Packets: BPF, libpcap, tcpdump, and Wireshark

During the late 1990s Internet boom, computer networks were exploding in popularity. So was, the need to observe, troubleshoot, and secure them. Unfortunately, many operators couldn’t afford the network visibility tools available at that time, which were all commercially offered and very expensive. As a consequence, a lot of people were fumbling around in the dark.

Soon, teams around the world started working on solutions to this problem. Some involved extending existing operating systems to add packet-capture functionality: in other words, making it possible to convert an off-the-shelf computer workstation into a device that could sit on a network and collect all the packets sent or received by other workstations. One such solution, Berkeley Packet Filter (BPF), developed by Steven McCanne and Van Jacobson at the University of California at Berkeley, was designed to extend the BSD operating system kernel. If you use Linux, you might be familiar with eBPF, a virtual machine to safely execute arbitrary code in the Linux kernel: the “e” stands for “extended.” eBPF is one of the hottest modern features of the Linux kernel. It’s extremely powerful and flexible after many years of improvements, but it started as a little programmable packet-capture and filtering module for BSD Unix.

BPF came with a library, called libpcap, that any program could use to capture raw network packets. Its availability triggered a proliferation of networking and security tools. The first tool based on libpcap was a command-line network analyzer called tcpdump, which is still part of virtually any unix distribution. In 1998, however, a GUI-based open source protocol analyzer called Ethereal (renamed Wireshark in 2006) was launched. It became (and still is) the industry standard for packet analysis.

What tcpdump, wireshark, and many other popular networking tools have in common is the ability to access a data source that is rich, accurate, and trustworthy and can be collected in a non-invasive way: raw network packets. Keep this concept in mind as you continue reading!

Snort and Packet-Based Runtime Security

Introspection tools like tcpdump and Wireshark were the natural early applications of the BPF packet-capture stack. However, people started getting creative in their use cases for packets. For example, in 1998, Martin Roesch released an open source network-intrusion detection tool called Snort. Snort is a rule engine that processes packets captured from the network. It has a large set of rules that can detect threats and unwanted activity by looking at packets, the protocols they contain, and the payloads they carry. It inspired the creation of similar tools such as Suricata and Zeek.

What makes tools like Snort powerful is their ability to validate the security of networks and applications while applications are running. This is important because it provides real-time protection, and the focus on runtime behavior makes it possible to detect threats based on vulnerabilities that have not yet been disclosed.

The Network Packets Crisis

You’ve just seen what made network packets popular as a data source for visibility, security and troubleshooting. Applications based on them spawned several successful industries. However, the several trends eroded packets’ usefulness as a source of truth:

  • Collecting packets in a comprehensive way was becoming more and more complicated, especially in environments like the cloud, where access to routers and network infrastructure is limited.

  • Encryption and network virtualization made it more challenging to extract valuable information.

  • The rise of containers and orchestrators like Kubernetes made infrastructures more elastic. At the same time, it became more complicated to reliably collect network data.

These issues started becoming clear in the early 2010s, with the popularity of cloud computing and containers. Once again, an exciting new ecosystem was unfolding, but no one quite knew how to troubleshoot and secure it.

System Calls as a Data Source: sysdig

That’s where your authors come in. We released an open source tool called sysdig. We were inspired by a set of questions: What is the best way to provide visibility for modern cloud native applications? Can we apply workflows built on top of packet capture to this new world? What is the best data source?

Sysdig originally focused on collecting system calls from the kernel of the operating system. System calls are a rich data source, even richer than packets, because they don’t exclusively focus on network data: they include file I/O, command execution, interprocess communication, and more. They are a better data source for cloud native environments than packets, because they can be collected from the kernel for both containers and cloud instances. Plus, collecting them is easy, efficient, and minimally invasive.

Sysdig at that time had three separate components:

  • A kernel capture probe (available in two flavors, kernel module and eBPF)

  • A set of libraries to facilitate the development of capture programs

  • A command-line tool with decoding and filtering capabilities

In other words, it was porting the BPF stack to system calls. Sysdig was engineered to support the most popular network-packet workflows: trace files, easy filtering, scriptability, and so on. From the beginning, we also included native integrations with Kubernetes and other orchestrators, with the goal of making them useful in modern environments. Sysdig immediately became very popular with the community, validating the technical approach.

Falco

So what would be the next logical step? You guessed it: a Snort-like tool for system calls!

A flexible rule engine on top of the sysdig libraries, we thought, would be a powerful tool to detect anomalous behavior and intrusions in modern apps reliably and efficiently. Essentially the Snort approach, but applied to system calls and designed to work in the cloud.

So, that’s how Falco was born. The first (rather simple) version was released at the end of 2016 and included most of the important components, in particular the rule engine. Falco’s rule engine was inspired by the Snort one but designed to operate on a much richer and more generic dataset and was plugged into the sysdig libraries. It shipped with a relatively small but useful set of rules. It was largely a single-machine tool, with no ability to be deployed in a distributed way. We released it as open source because we saw a broad community need for it. And, of course, because we love open source!

Expanding into Kubernetes

As the tool evolved and the community embraced it, its developers expanded it into new domains of applicability. For example, in 2018 we added Kubernetes Audit Logs as a data source. This feature lets Falco tap into the stream of events produced by the Audit Log and detect misconfigurations and threats as they happen.

Creating this feature required us to improve the engine, which made Falco more flexible and better suited to a broader range of use cases.

Joining the Cloud Native Computing Foundation

In 2018 Sysdig contributed Falco to the Cloud Native Computing Foundation (CNCF) as a sandbox project. The CNCF is the home of many important projects at the foundation of modern cloud computing, such as Kubernetes, Prometheus, Envoy, and OPA. For our team, making Falco part of the CNCF was a way to evolve it into a truly community-driven effort, to make sure it would be flawlessly integrated with the rest of the cloud native stack, and to guarantee long-term support for it. In 2021 this effort was expanded by the contribution of the sysdig kernel module, eBPF probe and libraries to the CNCF, as a subproject in the Falco organization. The full Falco stack is now in the hands of a neutral and caring community.

Plugins and the cloud

As years pass and Falco matures, a couple of things have become clear. First, its sophisticated engine, efficient nature, and ease of deployment make it suitable for much more than system-call-based runtime security. Second, as software becomes more and more distributed and complex, runtime security is paramount to immediately detecting threats, both expected and unexpected. Finally, we believe that the world needs a consistent, standardized way to approach runtime security. In particular there is great demand for a solution that can protect workloads (processes, containers, services, applications) and infrastructure (hosts, networks, cloud services) in a converged way.

As a consequence, the next step in the evolution of Falco is modularity, flexibility, and support for many more data sources spanning across different domains. In summer 2021, Falco added a new plugin infrastructure that allows it to tap into data sources like cloud provider logs to detect misconfigurations, unauthorized access, data theft, and much more.

A long journey

Falco’s story stretches across more than two decades and links many people, inventions and projects that at a first glance don’t look related. In our opinion, this story exemplifies why open source is so cool: becoming a contributor lets you learn from the smart people who came before you, build on top of their innovations, and connect communities in creative ways.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset