Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

2 Concepts, architecture, and deployment of Fluentd

This chapter covers

Outlining Fluentd’s architecture and core concepts
Reviewing prerequisites and deployment of Fluentd, Fluent Bit, and Fluent UI
Executing basic configurations of Fluentd and Fluent Bit
Introducing configuration file structure

Chapter 1 looked at the theory, industry trends, and use cases that Fluentd can help us with. This chapter discusses how Fluentd works, including deploying and running the simplest of configurations to implement the traditional developer’s “Hello World.”

2.1 Architecture and core concepts

When you’re driving a car, it is a lot easier when you have some basic appreciation of how the vehicle is powered (e.g., gas, diesel, electric, liquefied petroleum gas). The mental models that come with such understanding mean we can learn what to expect—whether we can expect to hear the engine rev, whether it’s possible for the engine to stall, and how the gears work (if there are any). For the same reason, before we start working with Fluentd and Fluent Bit, it is worth investing time in understanding how these tools work. Based on this, we should run through some of the building blocks of Fluentd that will help with the mental models.

2.1.1 The makeup of a log event

Chapter 1 introduced the concept of log events. Understanding how Fluentd defines a log event is the most crucial thing in appreciating how Fluentd works, so let’s look at its composition. Each log event is managed as a single JSON object comprised of three mandatory, nonrepeating elements, as described here and shown in figure 2.1:

Tag—Each log event has a tag associated with it. The tags are typically linked to the source initially through the configuration but can be subsequently manipulated within the configuration. Fluentd can apply conditional operations (routing, filtering, etc.) to the log events as necessary by using the tags. When using the HTTP interface, the tag can be defined in the call, as we will see.
Timestamp —This is derived from the log information or is applied by the input plugin. This ensures that the events are kept in series, an essential consideration when unifying multiple log sources and potentially trying to understand the sequence of events across components. This data is held as nanoseconds from epoch (1 January 1970 00:00:00 UTC).
Record—The record is the core event information after separating out the time. This means we can address the log content without worrying about locating the timestamp for the event and the tag needed for basic controls, as we’ll see later in the book. This provides an immediate benefit; whenever a log event is passed in from a Fluentd-aware adaptor, we can avoid initial parsing for the time. It is possible to translate the record into further detailed structures to make it easier to process. We see how to apply more meaning to the data later in the book.

Figure 2.1 Makeup of log event

Once captured, other plugins can then work with the existing tags to modify, add, and extend them as necessary. When working with the tags, they can have wildcards and other logic applied to them. For example, if we have several separate logs associated with one solution (call them subsystems 1, 2, and 3), we could tag each log file as App .Subsystem1, App.Subsystem2, and App.Subsystem3. The processing of the logs could then be addressed by using a wildcard (e.g., App.*). We can set the filter to be more specific for handling only a specific subsystem’s log events (e.g., App.Subsystem2).

2.1.2 Handling time

Given the importance of timestamps in the logs, all systems that need to work together must report against a common clock/time. In addition, the time must not be subject to movements for daylight savings. Without this, every time the clocks go back, logs will get out of sync. When clocks are moved forward, the logs will see an irregular period of no log events being recorded. This can trigger anomalies if any time-based analysis (analysis for event throughput, measurement of meant time between errors, watching for heartbeat events, etc.) is performed.

This consideration is compounded by the fact that systems may be working together across multiple time zones. Therefore, all systems need to run against a collectively agreed upon time. The typical solution to this is to link systems to Coordinated Universal Time (UTC). However, when we need to have millisecond precision on the timestamps across multiple servers to ensure correct order, something is required to keep them in sync.

Time synchronization is handled by linking servers to a common time source and then using a protocol to request a time to align. This protocol is known as Network Time Protocol (NTP). When configuring a server, it is highly recommended to ensure that NTP is configured. Many technologies and service providers offer a free standard NTP service to synchronize with. There is a limit to this; the duration for the current time to reach different servers can differ by a few milliseconds or nanoseconds (depending on the location of the NTP service). This is known as clock or time skew. Despite best efforts, log entries may very occasionally appear out of step when aggregating across multiple servers.

2.1.3 Architecture of Fluentd

Fluentd’s operations are prescribed by a configuration file (which may include other configuration files, but this will be addressed later in the book). The configuration file describes how and, in some cases, when a plugin should be applied. A good number of plugins are incorporated into the core of Fluentd, so they require no additional installation—for example, the tail plugin that operates a bit like the Linux tail -f command. For those less familiar with Linux/Unix utilities, the tail -f command provides the means to see on the console what is being added to a file as it occurs.

In chapter 1, we introduced the idea of plugins and illustrated them with some examples. Before we build on this and examine the types of plugins in more detail, we should clarify a point of terminology. If you read the Fluentd documentation, it refers to directives; these can overlap with plugin types. But the relationship between types of plugins and a directive is not one-to-one in nature, as plugins can have supporting or helper relationships and therefore not a directive. Later in the chapter, as we look at the “Hello World” example, we’ll see the directives and plugins, and how Fluentd knows where to pick up a configuration file.

The following list focuses on the core plugin types and where they map to directives we have identified. In addition to this, we have highlighted the more common plugin interrelationships:

Input—In terms of the configuration file, the input plugins will correlate to a source directive. An input can leverage parser plugins that can take the raw log text and assert structural meaning. For example, they can extract key values from the message text, such as log event classifications needed for later processing. Inputs range from files to data stores to direct API integrations.
Output —As a type of plugin, these provide us with the means to store (e.g., file, database) or connect to another system (including another Fluentd node) to pass on the log events. The output plugin aligns with the match directive within the configuration file—something that is not obvious at this stage but, as we illustrate the use of Fluentd, will become more apparent. The output plugin can leverage formatter, filter, buffer, and service discovery plugins. The more generic input plugins have an equivalent output.
Buffer —The buffer plugin type focuses on the batching up and temporary caching of log events so that the I/O workload can be optimized. This issue will be addressed in more depth as we progress through the book.
Filter —This plugin type applies rules through which we can control where log events can go. This plugin is engaged with the output plugin.
Parser —This plugin’s task is to take the log event, extract key values, and apply additional needed structure to the captured content. This is key when taking content from sources such as log files, which will start effectively as a single line of text. This can range from regex and grok to domain-specific logic.
Formatter —When content is output, it needs to be produced so that the data can be handled by the consuming component. For example, structure the content so it can be consumed by Prometheus or Grafana, which expect specific structures or a humanly readable message for PagerDuty. As a result, the formatter plugin gets used by the output plugins within the match directives.
Storage —As we will see shortly, the performance and efficiency of Fluentd is a tradeoff with the way we need to handle log events. Storing log events means we can keep the events (often temporarily) until they need to be processed. Temporary storage, such as caches, can give us performance gains, but at the risk of losing the event in a failure. Some storage options are therefore more durable to mitigate such a risk. We will use storage plugins in several different ways throughout the book.
Service discovery —When this plugin is used, it typically works in tandem with the output plugin. Its purpose is to help connect to other Fluentd nodes, as we will explore later in the book. This type of plugin addresses how the target servers are identified/found within a network, from a list of server IPs in a reloadable config to using specific parts of a DNS record.

In figure 2.2, we represent the core Fluentd building blocks, along with supporting elements that exist to help the extension, adoption, and use of Fluentd. Note that the specific plugins implemented in the diagram are only a subset of those built in the standard deployment and a fraction of those deployable and used by Fluentd. As we progress through the book, all of these building blocks will be examined in depth, from configurations to tune the engine to how the plugin base provides the foundations for controlling all plugin behavior. But appreciating the different blocks and their relationships will help from the outset.

Figure 2.2 View of the Fluentd architecture illustrating the core building blocks and optional support resources available depending on your context

2.1.4 Fluent configuration execution order

Log events are consumed only once within a Fluentd or Fluent Bit instance unless Fluentd is told to explicitly copy the log event (using a feature within the core of Fluentd, which we will address later in the book). This sequencing is illustrated in figure 2.3.

Figure 2.3 Illustration of order impact in a Fluentd configuration

The order in which operations are defined within a configuration file is significant. The first directive that matches an event will become the consumer unless that event is copied. Therefore, as a general practice
- When you want all log events to undergo common operations, define those directives early in the configuration, but copy them for later targeted directives.
- Catch-all directives should be late in the configuration.
- Targeted directives should precede the catch-all directives.
Fluentd by default is single-threaded. This helps to ensure that time series is not compromised. Fluentd can be configured to run in a concurrent manner (multiprocess rather than threaded) by changing the configuration, and we will look at that later in chapter 7. It does mean that if you create a complex series of log event operations, it’s possible that Fluentd cannot process events as fast as they are created. This means a bottleneck has been made. There are strategies for avoiding this, but this will further complicate the whole process.

The challenges of multithreading are varied, from coordination overhead when more threads are running than processor cores to mutual thread-locks (two threads waiting for each other). When it comes to time-series events, keeping things in sequence or correcting order is important. If not carefully applied, multithreading can create race conditions that may lead to events getting out of sequence. To better understand race conditions, an excellent source is https://devopedia.org/race-condition -software.

2.1.5 Directives

Previously, we mentioned directives within Fluentd, and it is easy to mix up directives and plugins. Directives provide a framework for grouping plugins to achieve a logical task, such as outputting log events to a destination. You’ll see that directives are declared in the same way as XML elements by being started and ended with angle brackets. It is possible to supply attributes within the element, such as tag filtering, as is the case with of the match example. Within the directive, we then identify the plugin and supply its configuration as name-value pairs. As we get to more sophisticated examples, you’ll see that we can nest things, including helper plugins.

If a command or plugin must be called directly by the logic that makes Fluentd process a stream of log events, then it is a directive. While this is very abstract at this stage, the idea and subtlety will become more apparent as we progress through the book and its examples. As figure 2.4 illustrates, we can visualize the directives, plugins, and helper plugins that appear in configuration files.

Figure 2.4 Relationships between Fluentd directives in the context of Fluentd’s execution order (central column—Source, Filter, Match) and native plugins (parser, buffer, formatter)

The directives illustrated in figure 2.4 are summarized in table 2.1. We will examine each of these directives in depth in part 2 of this book.

Table 2.1 Fluentd directives

Directive	Description
source	The source directive tells Fluentd to receive/source log events, as we’ve just seen.
match	This is about matching log events to other operations, including the output of log events.
filter	This controls which events should be handled by one or more processes—typically referred to as a pipeline.
@include	This tells Fluentd to bring in other configuration files to assemble a complete set of operations, just as import or include statements do in conventional code.
label	The label provides a grouping mechanism for log events, which provides significantly more capability than just using tags.
system	This tells Fluentd how to configure and behave internally (e.g., the setting of log levels).

2.1.6 Putting timing requirements into action

If you want to see how much you have absorbed so far, try answering these questions. The answers follow these questions.

What are the three key elements of a log event within Fluentd/Fluent Bit?
What is the recommended time zone to connect time servers with?

Answers

We introduced these in section 2.1.1; the elements of a log event are
- Timestamp —Representation of the log event occurrence
- Record —The body of the log event
- Tag—Associated with each log entry and used to route the log event
As you may recall, in section 2.1.2, we recommend linking your NTP servers using UTC.

2.2 Deployment of Fluentd

In this section, we will deploy Fluentd and tools such as the LogGenerator (sometimes referred to as the LogSimulator) to enable us to run the “Hello World” scenario and

the subsequent examples and exercises. All the configuration files for Fluentd and the simulator can be found in the book’s GitHub repository (http://mng.bz/Axyo). Within the repository, each chapter has its own set of folders. Note that the configuration files in the repository will differ slightly from those shown in the configuration examples in the book, so they can include helpful additional comments. We assume that the complete code and configuration samples will be downloaded either from Manning or via our GitHub repository for the book. Each chapter folder contains subfolders for code, configurations, and solutions. The LogGenerator (more on this later) has been downloaded from GitHub (https://github.com/mp3monster/LogGenerator) and copied into the root folder for the chapters (e.g., the root shown in figure 2.5).

Figure 2.5 Directory structure used in the book for examples and solutions

NOTE As Fluentd, Fluent Bit, and the LogSimulator are used throughout the book, we have incorporated the instructions within the chapter. In later chapters, where we use other utilities and products for one or possibly two chapters, we have provided the instructions in appendix A.

2.2.1 Deploying Fluentd for the book’s examples

We already established in chapter 1 that Fluentd and Fluent Bit are both very capable when it comes to the means to deploy onto diverse platforms. That creates an interesting challenge for this book. Do we describe deploying Fluentd and Fluent Bit onto the widest variety of platforms or focus on just one? Do we make you work with Docker and bundle everything up in an image?

The approach we took in this book was to support Windows first; this is predicated on the fact that in trying, prototyping, and experimenting with Fluentd, you are likely using a desktop or laptop computer rather than an enterprise server. Windows is the most dominant OS on desktop and laptop machines, so it makes sense to focus on that environment.

However, to make it easier to take the guidance in this book to enterprise servers, or if you’re fortunate enough to have a Mac or you’re a committed Linux fan and have installed your favorite flavor of Linux OS, we have highlighted differences between Linux and Windows. The majority of instructions will include the Linux equivalent. Those working with Linux or macOS will most likely know that Linux is just the kernel and that the layers above this, such as the UI layer, and installation managers differ across the Linux flavors. This means you may need to tweak the commands provided to work on your particular flavor of OS.

Docker image

It is possible to also download a prepared Docker image made available via Docker Hub (https://hub.docker.com/r/fluent/fluentd/) or directly from the Fluentd GitHub site (https://github.com/fluent/fluentd-docker-image). For production environments, this approach is worth considering and is explored further in chapter 8. In most of the book, utilizing Docker will simply add additional effort unless you’re entirely conversant with using Docker.

2.2.2 Deployment considerations for Fluentd

When considering the deployment of Fluentd into production, we need to consider volume metrics—that is, the amount of log data needing to be captured, filtered, routed, and stored. In part 3 of the book, we will focus on the ability of Fluentd and Fluent Bit to be scaled out and distributed. But to start with, let us assume that we are working in an environment that does not demand such levels of scaling. Even in a simple deployment, we should be aware that the computing effort for log processing should be less than the computing effort for the core application. Remember, each time log events are stored or transmitted, the operation generates a lot of I/O activity, which carries a computational overhead. If you are familiar with low-level computer operations, you will appreciate that every process comes with an overhead:

Every network message is topped and tailed with routing, verification, and details such as the size of the message.
Every file write requires the use of the hardware to locate a chunk of physical storage that can be used, record the details of the block of storage used, and mechanically position the writing device for physical media.

The more we can group log events in a cache and transmit them as a block, the more efficient the use of resources is. Like all things in life, there is a tradeoff. We cache before writing to storage, which means the data is slower reaching the end of the log event processing. The longer data is working through a process, the more likely that a power loss or component failure will result in data loss. For this chapter, we only need to make sure our environment has enough resources to run; considerations of performance versus the risk of data loss aren’t necessary.

2.2.3 Fluentd minimum footprint

Fluentd resource requirements are minimal (see table 2.2) by modern machine specs, but are worth noting when dealing with small footprint setups.

Table 2.2 Fluentd minimum hardware footprint

RubyInstaller size	130 MB
Ruby installed storage needs (with DevKit)	(80 MB for basic Ruby, plus 820 MB for the DevKit 1 GB)
Memory required	~20 MB
Fluentd additional storage	300 KB
Ruby minimum version	Ruby 2.x (against Fluentd v1.x)

2.2.4 Simple deployment of Ruby

To get ready to run Fluentd, we need to first install Ruby. This is best done with the latest stable version of Ruby using your operating system’s package framework. Links to the different installation packages can be found via www.ruby-lang.org. For Windows, we do this by going to the Downloads page that has links to the relevant artifact. For Windows, we get taken to https://rubyinstaller.org to retrieve the RubyInstaller. When we get to chapter 8, we will need to do a bit of development, so we should install the software development kit (SDK) version of Ruby (shown as Ruby+DevKit on the website).

Once downloaded, run the installer; it will then take you through the steps to define the preferred location, and it will also ask if you want to install Mysys—say yes. Mysys is needed for RubyGems with a low-level C dependency, such as plugins interacting with the OS. Several development-related tools, such as MinGW, allow Ruby development to use Windows C native libraries. This means we should have Mysys, and we recommend taking the complete installation with MinGW to support any possible development requirements later.

NOTE Additional information about DevKit is available in the liveBook version of Rails4 in Action by Ryan Bigg, et al. (Manning, 2015) at http://mng.bz/ZzAR.

The installer should add Ruby to the Windows PATH environment variable. (Appendix A provides details on the PATH environment variable.) When checking, you need to confirm that the bin folder for Ruby is included. If the Ruby directory path is not in the PATH environment variable, we need to follow the instructions in appendix A to add the full Ruby path. Once set, it should be possible to execute the command ruby –version, and Ruby will display the installed version once the path has been amended.

NOTE It is worth noting an open source package manager for Windows called Chocolatey (https://chocolatey.org/), which feels more like a Linux package manager. Chocolatey can be used as an alternative means to install Ruby.

For Linux users, all the major Linux OSes have a relevant package manager with a recent stable installation—from Homebrew for macOS to apt, yum, pkg, and others. When there is an option, as with Windows, it is worth installing everything to support the development activities undertaken in chapter 9. Like Windows, we need to confirm the path has been correctly set using the instructions in appendix A. We also can verify Ruby using the same command, ruby –version. In addition, we need to verify whether the package manager has included the RubyGems package manager. Check this by running the command gems help. This will return the gems help information or fail. If this fails, then the following steps are needed (replacing x.y.z in the next steps with the latest stable version):

wget http://production.cf.rubygems.org/rubygems/rubygems-x.y.z.tgz
tar xvf rubygems*
ruby setup.rb

2.2.5 Simple deployment of Fluentd

Fluentd can be installed in a variety of different ways. Treasure Data (introduced in chapter 1) provides a Windows installer for Fluentd, but it should be noted that the installer introduces a prefix of td into file and folder names. The Treasure Data installer also includes additional plugins not included in the standard installer.

There is a wealth of ways to install Fluentd and its dependencies with different benefits and nuances. We will install Fluentd using RubyGems for the following reasons:

Gems package installer is platform-neutral, so the installation process is the same for Linux, Windows, and many other environments.
Gems are the easiest way to install plugins not included in the core of Fluentd.
We have Gems installed (needed to help install Ruby dependencies), so we can keep our approach consistent.

To install Fluentd this way, we simply need to run the following command:

gem install fluentd

As long as you have connectivity to https://rubygems.org/, then relevant Gems, including dependencies, will safely download and install. These sites may need to be accessed through a proxy server or a local gems server in enterprise environments. The installation can be tested by running the following command:

fluentd –-help

This will display the help information for Fluentd. It should also be possible to see the Fluentd and other gems installed in the deployment location lib ubygems 2.7.0gems (and the equivalent path for other OSes).

In addition to the core Fluentd, the installation also provides some secondary tools, some of which we will use throughout the book. The major tools provided are summarized in table 2.3

Table 2.3 Fluentd support tools provided with an installation

Fluentd tool	Tool description
`fluent-binlog-reader`	Fluentd can create binary log files (giving compression and performance benefits)—for example, when file caching. This utility can be used to read the file and generate readable content.
`fluent-ca-generate`	This is a utility for creating basic (self-signed) certificates that can be used to encrypt communications between Fluentd/Fluent Bit nodes.
`fluent-cat`	The `fluent-cat` tool provides a means to inject a single log message into Fluentd; it does require the forward plugin to be configured. For example: `echo '{"message":"hello"}' \| fluent-cat debug.log --host localhost --port 24224` This command would send a log event to the local Fluentd instance configured to listen on port 24224 using the forward plugin. We can use this to help test the routing, filtering, and output steps. But, crucially, it does not allow us to check the input plugin configurations (hence the LogSimulator).
`fluent-debug`	This is a utility to help with remote debugging, used in conjunction with the Ruby tooling.
`fluent-gem`	This is essentially an alias to the Ruby `gem` command, which will list all the gems available.
`fluent-plugin-config-format`	This provides the means to interrogate a plugin to obtain details of the configuration parameters the plugin will support. The output could be characterized as a README document. As some plugin implementations may support multiple types of plugin (e.g., input and output), it is necessary to specify the plugin type. For example (on both Windows and Linux), the command `fluent-plugin-config-format -f txt input tail` will retrieve the text format of the tail output plugin’s configuration details. This utility is ideal for being included within a continuous integration pipeline for custom-built plugins, as it can generate documentation in several formats.
`fluent-plugin-generate`	This generates a code skeleton for plugin development. The template includes a Gem file, README, stubbed Ruby code for the plugin, and a skeleton test framework.

A couple of OS differences

Linux- and Unix-based operating systems support a framework of interrupt signals. These signals can be sent to an application to control their behavior. Perhaps the most commonly known of these is SIGHUP. Fluentd can use these signals to trigger operations such as reloading the configuration file without needing to restart. Table 2.4 summarizes the essential interrupts and their impact.

Table 2.4 Linux signals and how Fluentd will react to them

Linux signal	Effect on Fluentd
SIGINT or SIGTERM	This tells Fluentd to gracefully shut down so that it clears everything in memory, and any file buffering is left in a clean state. If another process is calling Fluentd, it is better to stop that process first, as it can prevent the shutdown from completing.
SIGUSR1	This tells Fluentd to ensure that all of its cached values, including its log events, are flushed to storage and then refresh the file handles to the file storage. This is then repeated based on a system environment variable called flush_interval.
SIGUSR2	Secures and gracefully handles the reloading of the configuration. It can be considered graceful as it ensures any cache is safely stored before reloading the configuration, so no log events are lost.
SIGHUP	This interrupt is most known for forcing a configuration to reload. It performs the same operations as SIGUSR2 but also flushes its internal logs, so no internal log information is lost.
SIGCONT	This signal will get Fluentd to record its internal status—thread information, memory allocation, and so on.

Sending Linux kill commands to a Fluentd process—for example, kill -s USR1 3699, where 3699 represents the process ID for Fluentd—will result in Fluentd interpreting the signal as a SIGUSR1 signal. At present, there is no Windows-equivalent way to send these signals, although several change requests have been submitted to the project for such features.

File handles

Within a Linux file system, the number of file handles that can be used at any one time can be controlled, unlike Windows, which has these limits driven entirely by the OS version and architecture (e.g., 32 or 64 bit). Additionally, Linux uses file handles for real files, but these handles also represent things like network connections. The default number of file handles can be restrictive for Fluentd. It is not unusual to adjust the number of file handles held open in production environments. Manipulating the file-handle limits can be done by editing configuration files or using the Linux ulimit command. More detail can be found at https://linuxhint.com/linux_ulimit_command/. The number of file handles shouldn’t be a problem for the examples and scenarios provided, but when ramping up the volume in a production context, it is something to be aware of. The correct number of file handles depends on the number and speed of files being written, the number of network ports being supported, and so on.

2.2.6 Deploying a log generator

Ideally, we want to prove our configuration for input plugins and confirm configuration for things like log rotation. We want a configuration-driven utility that can continuously send log events. We have one available at https://github.com/mp3monster/LogGenerator and will be using this in the subsequent chapters. This tool provides several helpful features for us:

Take an existing log file, replay the log events from an existing log file, write them with current timestamps, and write the logs with the same time intervals as the logs had when originally written.
Take a test log file that describes the time gap and log body, and play it back with the correct intervals between events.
Write log files based on a pattern, meaning different log formats can be generated.
Send the logs via the Java logging framework to simulate an application using a logging framework.

The LogGenerator GitHub repository includes extended documentation on how the tool can be used. The utility is written using Groovy, which means at its heart is Java and the use of standard Java classes and libraries. Groovy adds several conveniences over Java. Specifically, it executes as a script to keep development quick and easy, meaning tweaking it for your own needs is easy; it includes some convenience classes that make working with REST and JSON very easy. Not everyone wants to install Groovy or modify the script. As a result, we have taken advantage of Groovy’s relationship with Java to compile and package it to a JAR file, making it possible to be executed without installing Groovy if preferred. The JAR is available to download from GitHub as well.

Java installation

To install Java, you can either use a package manager or retrieve and download from www.java.com/en/download/. The tool’s implementation has been done so that Java 8 or later will work. Still, you need the Java Development Kit (JDK) rather than the Java Runtime Environment (JRE). Once Java is downloaded and installed, you need to ensure that the correct version is set up in your PATH environment variable and JAVA_HOME. We assume that you do not have any other applications using Java and are dependent on a different Java version. If this is the case, we recommend writing a script to set these variables each time you start a new console to run the LogGenerator; this approach is illustrated for the Groovy setup. You can check which version of Java is in use with the command java –version.

Groovy installation

If you want to use the LogSimulator from the prepared JAR, you can skip this section, but if you want to use the Groovy version, see how it works, or modify it, you’ll need the following steps. With the prerequisite Java installed, we can now install Groovy (download from https://groovy.apache.org/download.html or install it using a package manager). As with Java, you also want Groovy to be set on the PATH environment variable and GROOVY_HOME setup. You can confirm whether Groovy is suitably installed using the command groovy -–version. The following code fragments are example scripts for ensuring environment variables are set up. This is the Windows setup:

set JAVA_HOME=C:Program FilesJavajdk1.8.0_221
set PATH=%JAVA_HOME%in;%PATH%
echo Set Shell to Java
java -version
set GROOVY_HOME=C:Program FilesGroovy-3.0.2
set PATH=%GROOVY_HOME%in;%PATH%
echo Set Shell to Groovy
groovy --version

The Linux version of this script would be

export JAVA_HOME=/usr/lib/jdk1.8.0_221
export PATH=$JAVA_HOME/bin:$PATH
echo Set Shell to Java
java -version
export GROOVY_HOME=/usr/lib/Groovy-3.0.2
export PATH=$GROOVY_HOME/bin;$PATH
echo Set Shell to Groovy
groovy --version

The simulator uses a properties file to control its behavior and uses a file that describes a series of log entries to replay. We will use this in later chapters to see how log rotation and other behaviors can work. Each book chapter has a folder containing the relevant properties files and log sources to help with that chapter, as shown in figure 2.5. With the LogSimulator copied into the download root folder as previously recommended, run this command:

groovy LogSimulator.groovy Chapter2\SimulatorConfig\tool.properties

We can see an example of the console output when running the LogSimulator as a Groovy application in figure 2.6.

Figure 2.6 LogSimulator example output when in verbose mode, using the HelloWorld-Verbose.properties file and Fluentd running with the associated HelloWorld.conf file

Running LogSimulator as a JAR

To use the JAR version of the LogSimulator, the JAR file needs to be downloaded into the parent directory of all the chapter resource folders. Then the command can have the Groovy LogSimulator.groovy element replaced with java -jar LogSimulator .jar, so the command would appear as

java -jar LogSimulator.jar Chapter2\SimulatorConfig\tool.properties

We will assume you’ve installed Groovy and run the LogGenerator using the Groovy command for the rest of the book. But as you can see, the only difference is the part of the command that uses Java or Groovy and the JAR or Groovy file. The GitHub repository includes all the details on how the JAR file is generated if you wish to extend the tool and re-create the jar file.

LogSimulator in more detail

If you would like to know what is going on in more depth, then edit the tool.properties file and change the verbose property from false to true. This will display to the console log entries that are defined in the file small-source.txt. All the properties for the simulator are explained in the documentation at https://github.com/mp3 monster/LogGenerator.

2.2.7 Installing Postman

An easy-to-use tool is needed to send single log events to exercise the Fluentd configuration in our “Hello World” scenario. While utilities such as cURL can be used, we have elected to use Postman with its friendly UI and ability to work across multiple platforms. Postman is a well-known tool that supports most environments (Windows, macOS, Linux, etc.). Postman is free for individual use, and the binary can be retrieved from www.postman.com/downloads/.

For Windows, this is an installer that will resolve the appropriate file locations. For Linux, the download is a tarred gzip file that will need to be unpacked (e.g., tar -xvf Postman-linux-x64-8.6.2.tar.gz). Once Postman is installed/untarred, ensure that it can be started—for Windows, this can be done with the installed links.

2.3 Bringing Fluentd to life with “Hello World”

Now that we’ve looked at the architecture of Fluentd and deployed it into an environment, let’s bring this to life.

2.3.1 “Hello World” scenario

The “Hello World” scenario is very simple. We will use the fact that Fluentd can receive log events through HTTP and simply see the console record the events. To start with, we will push the HTTP events using Postman. The next step will be to extend this slightly to send log events using the LogSimulator.

2.3.2 “Hello World” configuration

Before running the example, let us quickly look at the configuration file (see listing 2.1). As you can see, we have provided some comments within the configuration file. Within a configuration file, we can comment anywhere by leading with a hash (#) character. The configuration between <system> and </system> are instructions to Fluentd on how its internals should work; in this case, use Info-level logging. Then we have used a source directive to define the origins of log events using the built-in HTTP plugin capability that the @type identifies. The following name-value pairs are then treated as attributes or properties for that plugin. For example, here we have defined the use of port 18080 to receive log events.

We then define an output using the match directive. The asterisk in the match directive is a wildcard, telling the match directive that any tag can be processed by the output plugin, in this case, standard out, which will appear in the console. The configuration file used in this example is stripped to the bare minimum, defining just the input and output parameters for each plugin and a couple of illustrative comments.

Listing 2.1 Chapter2/Fluentd/HelloWorld.conf

# Hello World configuration will take events received on port 18080 using
# HTTP as a protocol
 
# set Fluentd's configuration parameters
<system>
    Log_Level info             ❶
</system>
 
# define the HTTP source which will provide log events
<source>                       ❷
    @type http                 ❸
    port 18080                 ❹
</source>
 
# accept all log events regardless of tag and write them to the console
<match *>                      ❺
    @type stdout
</match>

❶ Set the default log level for Fluentd—because we have set the level to info, this is not strictly necessary, as that is the default.

❷ This is a source directive.

❸ @type indicates the plugin type.

❹ Lines following a plugin define configuration parameters for that plugin.

❺ The match directive defines which log events will be allowed into the plugin.

2.3.3 Starting Fluentd

As the Fluentd service is in our PATH, we can launch the process with the command fluentd anywhere. However, the tool will look in different places without a parameter defining the config location, depending on the environment and installation process. For Windows and Linux, Fluentd will try to resolve the location /etc/fluent/fluent.conf. For Windows, this will fail unless the command is run within a Linux subsystem. We are not using the default to start Fluentd. We need to navigate the shell to wherever you have downloaded the configuration file or include the full path to the configuration file as the parameter. Then run the following command:

fluentd -c HelloWorld.conf

To run the Fluentd command from the root of the downloaded resources, which will be the norm for the rest of the book, the command would be

fluentd -c ./Chapter2/Fluentd/HelloWorld.conf

This command will start Fluentd, and we will see the information displayed on the console as things start up, including the configuration being loaded and checked. When running Fluentd or Fluent Bit on Windows, depending upon the permissions for your user account, you may get a prompt, as shown in figure 2.7. This prompt occurs because Fluentd and Fluent Bit will, by default, expose access points to the network.

Figure 2.7 Windows prompting to allow Fluentd or Fluent Bit (depending on what is being started) access to use the network

We should, of course, allow access. Without it, both Fluentd and Fluent Bit will fail. Within a Linux environment, the equivalent security controls are established through IPTables rules and possible SELinux configuration. As Linux environments can vary more than Windows, it is worth having a good Linux reference to help set up and troubleshoot any restrictions. Manning has several such titles, such as Linux in Motion by David Clinton (www.manning.com/livevideo/linux-in-motion).

The next step is to send a log event using Postman. Once Postman has started, we need to configure it to send a simple JSON payload to Fluentd. Figure 2.8 shows the settings in the header.

Figure 2.8 Defined JSON payload to send to Fluentd using Postman

We also need to set the Body content, as we’re going to use a POST operation. By selecting Body (and the Raw option) on the screen, we can then key into the body field {"Hello" : "World"}. With this done, we’re ready now to send. We see this configuration in figure 2.9.

Figure 2.9 Setting the message body in Postman

Click the Send button in Postman. Figure 2.10 shows the result. You may have noticed that in the API call, we have not defined a time for the log event; therefore, the Fluentd instance will apply the current system time.

Figure 2.10 Fluentd output after sending the REST event—note the last line showing the output of the received event

While this configuration is as “useful as a chocolate teapot,” as the expression goes, it does illustrate the basic idea of Fluentd—the ability to take log events and direct them (explicitly or implicitly) to an output. Let’s finish this illustration by using the LogSimulator to create a stream of log events.

A new shell window is required to run the LogSimulator. Within the shell, you will need to navigate to where the configurations have been downloaded. Within each of the chapter’s folders is a folder called SimulatorConfig. Depending upon the chapter, you will find one or more property files. Inside the property file, you’ll find a series of key-value pairs that will control the LogSimulator’s behavior. This includes referencing the log file to replay or test data. These references are relative, meaning we need to be in the correct folder—the parent folder to the chapters—to start the simulator successfully. We can then start the LogSimulator with the command

groovy LogSimulator.groovy Chapter2SimulatorConfigHelloWorld.properties

or, if you choose to use the JAR file

java -jar LogSimulator.jar Chapter2SimulatorConfigHelloWorld.properties

Remember to correct the slashes in the file path for Linux environments. The LogSimulator is provided with a configuration that will send log events using a log file source using the same HTTP endpoint. This will result in each of the log events being displayed on the console.

2.4 “Hello World” with Fluent Bit

Fluent Bit, as previously mentioned, is written in C/C++, making the footprint very compact. The downside of this is that it requires more effort to build Fluent Bit for your environment. You will need to be comfortable with the Gnu Compiler Collection (GCC) (https://gcc.gnu.org/), which is typically available on Linux platforms, or the cross-platform C compiler Clang (https://clang.llvm.org/), which can work in a GCC mode. For this book, we aren’t going to delve any further into the world of C/C++ compilation. This means downloading one of the prebuilt binaries or using one of the supported package managers, such as apt and yum. For Windows, Treasure Data has provided Windows binaries (available at https://docs.fluentbit.io/manual/installation /windows). Because the binaries are provided by Treasure Data, the created artifacts make use of the prefix td. For simplicity and alignment to the basic version of Fluent Bit, we recommend downloading the zip version. We have used the zip download approach for our examples.

Unpack the zip file to a suitable location (we will assume C: d-agent) as the location. To make life easier, it is worth adding the bin folder (e.g., C: d-agentin) into the PATH environmental variables, as we did with Fluentd.

We can check that Fluent Bit has been deployed with the following simple command:

fluent-bit -–help

This will prompt Fluent Bit to display its help information on the console.

2.4.1 Starting Fluent Bit

The obvious assumption would be that as long as we limit our Fluentd configuration file to the plugins available in a Fluent Bit deployment, we can use the same configuration file. Unfortunately not—while the configuration files are similar, they aren’t the same. We’ll explore the difference in a while. But to get Fluent Bit running with our “Hello World” example, let’s start things with a configuration file previously prepared, using the command

fluent-bit -c ./Chapter2/FluentBit/HelloWorld.conf

As a result, Fluent Bit will start up with the configuration provided. Unlike Fluentd, Fluent Bit’s support for HTTP is more recent and may not have all the features you want, depending on when you read this. Therefore, it is possible to match Fluentd for HTTP in our scenario of sending JSON. If you bump up against HTTP feature restrictions, then you can at least drop down to using the TCP plugin (HTTP is a layer over the TCP protocols). Both Fluent Bit and Fluentd support HTTP operations for capturing status information and HTTP forwarding. The only downside of working at the TCP layer is that we can’t use Postman to send the calls. You can create the same effect with other tools that know how to send text content to TCP sockets. For Linux, utilities such as tc can do this. In a Windows environment, there isn’t the same native tooling. It is possible to create a Telnet session using tools such as PuTTY (www.putty.org), and LogSimulator includes the ability to send text log events to a TCP port. For Fluent Bit, let’s use Postman for HTTP and use the LogSimulator for TCP. Starting with TCP, the following command will start the LogSimulator, providing it with a properties file and a file of log events to send. As we have already installed this tool, we can start it up. Using a separate shell (with the correct Java and Groovy versions), we can run the command

groovy LogSimulator.groovy Chapter2SimulatorConfigfb-HelloWorld.properties.TestDatasmall-source.json

We can now expect to see the shell running the LogSimulator reporting the sent events to the console. The log events will be sent at varying time intervals (the console should look something like the screenshot in figure 2.11).

Figure 2.11 Simulator console output at the end of the log event transmission

At the same time, Fluent Bit in the other console will start reporting the receipt and sending to its console the JSON payloads received. This is shown in figure 2.12.

Figure 2.12 Example Fluent Bit console output

You may have noticed a lag between the simulator starting and seeing Fluent Bit displaying events. This reflects that one of the configuration options is the time interval when the cache of received log messages is flushed to the output. As we will discover later in the book, this is one of the areas that we can tune to help performance.

Now with HTTP

The difference between the TCP and HTTP configurations is small, so you can either make the changes to the Chapter2/FluentBit/HelloWorld.conf or use the provided configuration file Chapter2/FluentBit/HelloWorld-HTTP.conf. The following shows the changes that need to be applied:

In the Input section, change the Name tcp to Name http.
As we have been using port 18080 for HTTP in Postman, let’s correct the port in the configuration, replacing port 28080 with port 18080.

Save these changes once applied. To see how Fluent Bit will work now, stop the current Fluent Bit process if it’s still running. Then restart as before, or using the provided changes, start with

fluent-bit -c ./Chapter2/FluentBit/HelloWorld-HTTP.conf

Once running, use the same Postman settings to send the events as we did for Fluentd.

2.4.2 Alternate Fluent Bit startup options

Fluent Bit can also be configured entirely through the command line. This makes an effective way to configure Fluent Bit, as it simplifies the deployment (no mapping of configuration files needed). However, this does come at the price of readability. For example, we could repeat the same configuration of Fluent Bit with

fluent-bit -i tcp://0.0.0.0:28080 -o stdout

If you run this command with the simulator as previously set up, the outcomes will be the same as before. Fluent Bit, like Fluentd, isn’t tied to working with a single source of log events. We can illustrate this by adding additional input definitions into the command line. While running in a Windows environment, let’s add the winlog events to our inputs. For Linux users, you could replace the winlog source with cpu and ask Fluent Bit to tell us a bit more about what it is doing by repeating the same exercise, but with the command

fluent-bit -i tcp://0.0.0.0:28080 -i winlog -o stdout -vv

This time we will see several differences. First, when Fluent Bit starts up, it will give us a lot more information, including clearly showing the inputs and outputs being hand-led. This results from the -vv (more on this in the next section). As the log events occur, in addition to our log simulator events, the winlog information will be interleaved.

Fluentd and Fluent Bit internal logging levels

Both Fluentd and Fluent Bit support the same command-line parameters that can control how much information they log about their activities (as opposed to any log-level information associated with a log event received). In addition to being controlled by the command line, this configuration can be set via the configuration file. Both tools recognize five levels of logs, and when no parameter or configuration is applied, the midlevel (info) is used as the default log level. Table 2.5 shows the log levels, the command-line parameters, and the equivalent configuration setting. The easiest way to remember the command line is -v is for verbose and -q is for quiet; more letters increase verbosity or quietness.

Table 2.5 Log levels recognized by Fluentd and Fluent Bit

Log level	Command line	Configuration setting
Trace	`-vv`	`Log_Level trace`
Debug	`-v`	`Log_Level debug`
Info		`Log_Level info`
Warning	`-q`	`Log_Level warn`
Error	`-qq`	`Log_Level error`

Note Trace level setting will occur only if Fluent Bit has been compiled with the build flag set to enable trace. This can be checked using the Fluent Bit help command (fluent-bit -h or fluent-bit -–help) to display a list of the build flags and their settings. Trace-level logging should be needed only while developing a plugin.

2.4.3 Fluent Bit configuration file comparison

Previously we mentioned that the Fluentd and Fluent Bit configurations differ. To help illustrate the differences, table 2.6 offers the configuration side by side.

Table 2.6 Fluentd and Fluent Bit configuration comparison (using the HTTP configuration of Fluent Bit)

Fluent Bit	Fluentd
# Hello World configuration will take events received # on port 18080 using TCP as a protocol [SERVICE] Flush 1 Daemon Off Log_Level info # define the TCP source which will provide log events [INPUT] Name http Host 0.0.0.0 Port 18080 # accept all log events regardless of tag and write # them to the console [OUTPUT] Name stdout Match *	# Hello World configuration will take events received on port 18080 using # HTTP as a protocol # set Fluentd's configuration parameters <system> Log_Level info </system> # define the HTTP source which will provide log events <source> @type http port 18080 </source> # after a directive # accept all log events regardless of tag and write them to the console <match *> @type stdout </match>

Fluent Bit

Fluentd

# Hello World configuration will take events received 
# on port 18080 using TCP as a protocol
 
[SERVICE]
    Flush      1
    Daemon     Off
   Log_Level  info
 
# define the TCP source which will provide log events
[INPUT]
    Name  http
    Host  0.0.0.0
    Port  18080
 
# accept all log events regardless of tag and write
# them to the console
[OUTPUT]
    Name  stdout
    Match *

# Hello World configuration will take events received on port 18080 using
# HTTP as a protocol
 
# set Fluentd's configuration parameters
<system>
    Log_Level info
</system>
 
# define the HTTP source which will provide log events
<source>
    @type http
    port 18080
</source> # after a directive
 
# accept all log events regardless of tag and write them to the console
<match *>
    @type stdout
</match>

If you want to play spot the difference, then you should have observed the following:

Rather than directives being defined by opening and closing angle brackets (<>), the directive is in square brackets ([]), and the termination is implicit by the following directive or end of the file.
SERVICE replaces the system for defining the general configuration.
@type is replaced by the Name attribute to define the plugin to be used.
Match, rather than being the name of the directive with a parameter in the directive, becomes Output. The match clause is then defined by another name-value pair in the attributes.
Older versions of Fluent Bit didn’t support HTTP, so events would need to be sent using events using TCP, but the events received can still be in JSON format.

When looking at the configurations side by side, the details aren’t too radically different, but they are significant enough to catch people out.

2.4.4 Fluent Bit configuration file in detail

Looking more closely at the configuration file and the rules that are applied, we’ve just seen there are some similarities, and there are some differences. In the following listing, we have highlighted a few key rules.

Listing 2.2 Chapter2/FluentBit/HelloWorld.conf

# Hello World configuration will take events received 
# on port 18080 using TCP as a protocol
 
[SERVICE]                                ❶
 
    Flush      1                         ❷
 
    Daemon     Off                       ❸
 
    Log_Level  info                      ❹
 
# define the TCP source which will provide log events
[INPUT]                                  ❺
 
    Name    tcp
    Listen  0.0.0.0
    Port    18080
 
# accept all log events regardless of tag and write
# them to the console
[OUTPUT]         
    Name  stdout
    Match *                              ❻

❶ All the Fluent Bit general configuration values are set in this section.

❷ The Flush attribute controls how frequently Fluent Bit flushes its log cache to the output channels (stdout and stderr). In this case, we have set it to 1 second.

❸ This tells the Fluent Bit startup whether the process should run as a daemon process.

❹ Indentation is important in a configuration file and must be consistent. Recommended indentation is four space characters. Indentation, just like in a YAML file, indicates parent and child relationships. In this case, all these values are subservient to this input.

❺ Rather than sources, Fluent Bit configuration uses the terminology of input and output.

❻ In Fluent Bit, the controls on which log events pass through the plugin are determined not in the output declaration, as illustrated with Fluentd, but by a separate match attribute.

As with Fluentd, ordering within the configuration file is important, particularly in match statements—for example, if we added the following configuration fragment immediately before the current OUTPUT declaration:

[OUTPUT]
    Name file
    Path ./test.out
    Match *

Suppose the configuration appeared as follows:

# send all log events to a local file called test.out
[OUTPUT]
    Name file
    Path ./test.out
    Match *
 
# accept all log events regardless of tag and write
# them to the console
[OUTPUT]         
    Name  stdout
    Match *

Should we expect logs to appear in the log file, stdout (i.e., console), or both? The answer is that events will appear only in the file. This is because we match all events in both outputs; then it is the first output definition in the configuration that gets the events (i.e., the log file, with a wildcard match attribute; no log events will make it to stdout).

2.4.5 Putting the dummy plugin into action

To test out some of the details, see if you can implement the following configuration change. Within both Fluentd and Fluent Bit is a built-in input plugin called dummy. Modify the respective HelloWorld.conf files and incorporate the source, and then start up Fluentd and Fluent Bit, in turn, to see what outcomes you get. The result of the exercise is included at the end of the chapter.

Answer

Rather than filling the pages with configuration files, the answer configurations can be found in the downloaded folders Chapter2/ExerciseResults/Fluentd/HelloWorld-Answer.conf and Chapter2/ExerciseResults/FluentBit/HelloWorld-Answer.conf.

2.5 Fluentd deployment with Kubernetes and containers

So far, we have looked at the deployment of Fluentd and Fluent Bit as you might approach the requirement with only minimal consideration to how the host is working (native deployments, virtualization, and containerization). We have referenced some of the mechanisms that would allow us to further automate or containerize these tools. As discussed in chapter 1, Fluentd has a strong association with containerization and the use of Kubernetes. We’ll briefly look at how Fluentd is configured in a Kubernetes context; when we get to part 3 of the book, we’ll look at details such as scaling and containerization in depth.

Establishing a deployment of a Kubernetes environment and containerization warrants its own book (we recommend Kubernetes in Action, 2nd edition by Marko Lukša; www.manning.com/books/kubernetes-in-action-second-edition). It is, however, worth looking at how things operate in principle; as we work through the configuration of Fluentd in the following chapters, you will be able to appreciate how the configuration could relate to a Kubernetes deployment. It may also prompt ideas on how and what you may wish to monitor with Fluentd when it comes to the microservices themselves.

2.5.1 Fluentd DaemonSet

Fluentd is one of the options for incorporating log management into a Kubernetes environment. This is typically achieved by defining configuration files. The Kubernetes configuration files tell Kubernetes how pods (collections of containers that work together) and containers should be run across one or more worker nodes (servers providing compute power to a Kubernetes cluster). Within Kubernetes, we can describe different ways for pods to be deployed, such as ReplicaSets, Jobs, and DaemonSets. For example, it is possible to define things such that a Fluentd container will be executed on each worker node to collect log events from all the local containers running on that node. This type of configuration within Kubernetes is known as a DaemonSet and is a typical configuration for Kubernetes to have for Fluentd. As we’ll see later in the book, this isn’t the only way to deploy Fluentd, nor are we limited to one deployment model. In the next listing, we can see an example DaemonSet configuration for applying a configuration file and parameters for routing log events to another Fluentd node.

Listing 2.3 Chapter2/Out-of-the-box Fluentd DaemonSet designed for forwarding

apiVersion: apps/v1   
 
kind: DaemonSet
metadata:
  name: fluentd
  namespace: kube-system
  labels:
    k8s-app: fluentd-logging
    version: v1
spec:
  selector:
    matchLabels:
      k8s-app: fluentd-logging
      version: v1
  template:
    metadata:
      labels:
        k8s-app: fluentd-logging
        version: v1
    spec:
      tolerations:
      - key: node-role.kubernetes.io/master        ❶
 
        effect: NoSchedule                         ❷
 
      containers:                                  ❸
 
      - name: fluentd
        image: fluent/fluentd-kubernetes-daemonset
[CA]:v1-debian-forward                             ❹
 
        env:                                       ❺
 
          - name:  FLUENT_FOWARD_HOST
            value: "REMOTE_ENDPOINT"
          - name:  FLUENT_FOWARD_PORT
            value: "18080"
        resources:                                 ❻
 
          limits:
            memory: 200Mi
          requests:
            cpu: 100m
            memory: 200Mi
        volumeMounts:                              ❼
 
        - name: varlog
          mountPath: /var/log
        - name: varlibdockercontainers
          mountPath: /var/lib/docker/containers
          readOnly: true
      terminationGracPeriodSeconds: 30
      volumes:                                     ❽
 
      - name: varlog
        hostPath:
          path: /var/log
      - name: varlibdockercontainers
        hostPath:
          path: /var/lib/docker/containers

❶ Tells Kubernetes whether the pod should run on the master (controlling node)

❷ Tells Kubernetes this is something that must run continuously, rather than on a schedule

❸ Identifies to Kubernetes the container image to be used, which will run Fluentd

❹ This is where we start defining the containers within the pod. In addition to what is shown here, each container can have things done, such as defined startup commands—for example, tailoring each Fluentd instance.

❺ It is also possible to have environment variables within the container instance set using these name-value pairs. In this case, several variables are being defined, which are then referenced within a configuration file to direct the forwarding plugin.

❻ Resource quotas can be defined, so Fluentd doesn’t starve other processes running on a node of time. But this can have other consequences.

❼ Describes a mount point within the container that can be used to access externalized storage

❽ Describes where the container’s external storage would be on the underlying Kubernetes infrastructure. In an enterprise scenario, this could be a network storage device such as a SAN, or in the cloud it would be mapped to some form of block storage. This means we could map the logs that Fluentd generates to a shared location, and we could direct Fluentd instances to pick up a common configuration file.

NOTE The DaemonSet comes from http://mng.bz/nYea.

It should be noted that it is possible within an infrastructure hosting Kubernetes nodes to run processes such as Fluentd directly on the underlying platform. While this eliminates the abstraction layer of Kubernetes (and the associated overhead), it also removes the opportunity to use Kubernetes to manage and monitor that Fluentd is running. We recommend this only in very unusual circumstances.

NOTE DaemonSets are defined to provide basic operations on every worker node. This could be sending log events directly to Elasticsearch (as part of an EFK stack as discussed in chapter 1) or forwarding logs to various cloud vendor log analytics solutions, such as AWS CloudWatch. These can be found in the Fluentd GitHub (http://mng.bz/2jng).

Figure 2.13 illustrates how the Kubernetes configuration can work using a DaemonSet. Typically, the DaemonSet configuration would be held in a shared configuration repository or file system and then passed to Kubernetes through a tool like kubectl, the standard Kubernetes CLI tool. We have assumed that the Fluentd configuration resides on a shared file system and is therefore mounted by the Fluentd container to allow access. Another approach would be to pass the configuration using the DaemonSet YAML file or simply wire directly into the Docker image. The log consumers that the Fluentd configuration has within the DaemonSet could direct the log events to Elasticsearch, or to a file system outside the cluster that the Kubernetes configuration has made accessible. We will explore more about this when we get to scaling Fluentd.

Figure 2.13 A deployment model of Fluentd within Kubernetes as a DaemonSet. Each distinct server in the Kubernetes cluster has its own pod with a container running Fluentd.

2.5.2 Dockerized Fluentd

Like just about any application, in addition to manually installing or automating a manual install through tools like Ansible (www.ansible.com), it is possible to deploy Fluentd or Fluent Bit using the Docker container engine. Predefined Fluentd Docker files (i.e., the files that tell Docker how to build an executable image) are provided in the GitHub repository (https://github.com/fluent/fluentd-docker-image), which include addressing different host OS factors (e.g., Debian to Windows). Fluent Bit also has a smaller set of predefined Docker files in GitHub (https://github.com/fluent/fluent-bit). The GitHub repositories contain the configuration files and scripts. The realized images are held in Docker Hub and can be found at https://hub.docker.com/u/fluent for Fluentd and https://hub.docker.com/r/fluent/fluent-bit for Fluent Bit.

2.6 Using Fluentd UI

We have managed to install and run Fluentd and Fluent Bit. But in both cases, the control has been through the command line. Fluentd can also run with a web UI. The web UI is served from the same process that executes Fluentd’s core logic if it is installed.

2.6.1 Installing Fluentd with UI

The installation will trigger Fluentd to download and install a series of additional gems. This is because it provides the means to incorporate several plugins beyond the basic ones provided. This does mean the installation takes longer than just installing Fluentd. The commands to install the UI are

gem install -V fluentd-ui
fluentd-ui setup

Once the installation is complete, we can start the UI up with the following command:

fluentd-ui start

This will start up a Fluentd node, which includes a web server. The web UI can be accessed by opening port 9292 (i.e., pointing your browser to localhost:9292 will present you with the login screen).

The Fluentd UI is run using HTTP; no SSL/TLS certificate is used on a default installation. This is unlikely to be an issue in development/experiment environments. But running without SSL/TLS and at least basic credentials is far from recommended when it comes to production. This can be addressed in several ways:

Implement a reverse proxy in front of Fluentd-ui using Nginx or the Apache Server—a common approach to securing web content not protected by SSL/TLS certificates (documentation on how to do this is available at http://mng.bz/Ywne). It also means an additional process is running in your environment, with the need to have networking configured so that the reverse proxy isn’t bypassed.
For its web layers, Fluentd UI uses the Ruby on Rails framework (https://ruby onrails.org/) and the Ruby application server Puma (https://puma.io). Therefore, it is possible to configure Puma with an SSL/TLS certificate. Applying the configuration needs Ruby code changes and startup parameters with a knock-on for the Fluent code base. This is undesirable, as any update will mean reapplying those changes.
We wouldn’t recommend the use of Fluentd UI in production. This may seem like avoiding a problem rather than addressing it. However, there is a lot of merit in this. For production environments, you want to have Fluentd configuration files controlled through tools such as Git. This means not empowering users with a UI in production that can make configuration changes. It is better to get users to make controlled changes that can then be rolled out securely. If you’re running Fluentd in a microservices or distributed environment, allowing changes only from the controlled configuration file provides the means to drive environment consistency and reduce the chance of “configuration drift.”
Again, we recommend using Fluentd UI only for experimentation purposes and not in production. Given this, the following will provide enough insight to enable you to appreciate what the UI supports.

By default, the login username is admin, and the password is changeme. Once logged in, the UI presented will look something like figure 2.14. Differences can occur as the UI has reactive and responsive characteristics, resulting in the layout adjusting based on the device used to view the UI.

Figure 2.14 UI when Fluentd UI starts without any configuration

We need to provide some configuration values for the Fluentd node to perform with. Clicking Setup Fluentd will take us to a UI through which we can configure the behavior. Figure 2.15 illustrates some of the relevant configuration needed.

Figure 2.15 Fluentd UI for setting the configuration locations

The configuration fields are set with default values. Switch the Config File option to point to the existing HelloWorld.conf file used to run Fluentd. You may wish to also provide alternative locations for the process identifier (PID) and log files. As soon as we click the Create button in the UI, the server process will start if the locations and files can be written to and read from. The UI then switches to a different home page, as shown in figure 2.16.

Figure 2.16 Fluentd UI once the backend is running

The navigation menu on the left is now a lot richer. The Fluentd submenu provides options for working with the configuration file, accessing logs, and any error logs. The logs shown are the same as the console output. The navigation menu lets us see the details of Installed Plugins, Recommended Plugins, and Updated Plugins.

The core of the screen is given over to the live log being produced by the server with controls for starting and stopping operations and the current configuration. The Config File options will show us the configuration file being used and the ability to edit the configuration file directly. If the UI options for configuration become an issue, you can resort to traditional editing. The Add Source and Output options allow web pages that capture plugin configurations using the UI as a guided, form-based presentation for modifying configuration values. As figure 2.17 illustrates, the UI does provide a nice logical flow for setting up the plugins and their configuration values.

Figure 2.17 Fluentd UI defining inputs and outputs

Clicking on one of the Source, Filter, or Output elements will navigate you to a UI for configuring that type of plugin. For example, selecting a File source presents you with a file picker UI (as shown in figure 2.18).

Figure 2.18 Fluentd UI file picker as part of the File plugin configuration

Summary

Log events are composed of a tag, a timestamp, and a record that holds the core log event.
Using NTP for machine time synchronization is crucial when bringing multiple server logs together to ensure correct log ordering.
Fluentd and Fluent Bit can be deployed in most environments, as infrastructure requirements are very small and application dependencies are minimal. If necessary, you can compile these tools to work in niche situations.
There are a variety of ways for deploying Fluentd, including deploying Ruby and RubyGems and then retrieving Fluentd as a gem.
Deployment of the LogSimulator to quickly mimic sources of log events just requires Java, but to customize the tool, you need Groovy as well.
Fluentd can be used with Kubernetes and Docker logging, as well as with traditional environments. We can retrieve standard configurations for this for Kubernetes deployment.
When deployed on a Linux host, Fluentd can respond to signals such as SIGINT to shut down gracefully and SIGUSR2 to reload the configuration file.
Fluentd UI is one of the additional tools available with Fluentd. This provides a web front to visualize the configuration of a Fluentd environment and observe what Fluentd is doing. Other tools include the ability to generate certificates and list available plugins.
The order of how configurations are defined in a configuration file is important.
Fluentd’s and Fluent Bit’s own logging can be configured to different log levels.
Fluentd and Fluent Bit configurations are similar but not the same.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 2 Concepts, architecture, and deployment of Fluentd

Create new playlist

Sign In

Sign Up